cellrank.models.GAMR#

class cellrank.models.GAMR(adata, n_knots=5, distribution='gaussian', basis='cr', knotlocs=KnotLocs.AUTO, offset='default', smoothing_penalty=1.0, **kwargs)[source]#

Wrapper around R’s mgcv package for fitting Generalized Additive Models (GAMs).

Parameters:
  • adata (anndata.AnnData) – Annotated data object.

  • n_knots (int) – Number of knots.

  • distribution (str) – Distribution family in rpy2.robjects.r, such as ‘gaussian’ or ‘nb’ for negative binomial. If ‘nb’, raw count data in adata .raw is always used.

  • basis (str) – Basis for the smoothing term. See here for valid options.

  • knotlocs (Literal[‘auto’, ‘density’]) –

    Position of the knots. Can be one of the following:

    • ’auto’ - let mgcv handle the knot positions.

    • ’density’ - position the knots based on the density of the pseudotime.

  • offset (Union[ndarray, Literal[‘default’], None]) – Offset term for the GAM. Only available when distribution='nb'. If ‘default’, it is calculated according to [Robinson and Oshlack, 2010]. The values are saved in adata .obs['cellrank_offset']. If None, no offset is used.

  • smoothing_penalty (float) – Penalty for the smoothing term. The larger the value, the smoother the fitted curve.

  • kwargs – Keyword arguments for gam.control. See here for reference.

Attributes table#

adata

Annotated data object.

conf_int

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

model

Underlying model.

prepared

Whether the model is prepared for fitting.

shape

Number of cells in adata.

w

Filtered weights of shape (n_filtered_cells,) used for fitting.

w_all

Unfiltered weights of shape (n_cells,).

x

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

x_all

Unfiltered independent variables of shape (n_cells, 1).

x_hat

Filtered independent variables used when calculating default confidence interval, usually same as x.

x_test

Independent variables of shape (n_samples, 1) used for prediction.

y

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

y_all

Unfiltered dependent variables of shape (n_cells, 1).

y_hat

Filtered dependent variables used when calculating default confidence interval, usually same as y.

y_test

Prediction values of shape (n_samples,) for x_test.

Methods table#

confidence_interval([x_test, level])

Calculate the confidence interval.

copy()

Return a copy of self.

default_confidence_interval([x_test])

Calculate the confidence interval, if the underlying model has no method for it.

fit([x, y, w])

Fit the model.

plot([figsize, same_plot, hide_cells, perc, ...])

Plot the smoothed gene expression.

predict([x_test, key_added, level])

Run the prediction.

prepare(*args, **kwargs)

Prepare the model to be ready for fitting.

read(fname[, adata, copy])

De-serialize self from a file.

write(fname[, write_adata, ext])

Serialize self to a file.

Attributes#

adata#

GAMR.adata#

Annotated data object.

Return type:

AnnData

Returns:

adata : anndata.AnnData Annotated data object.

conf_int#

GAMR.conf_int#

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type:

ndarray

model#

GAMR.model#

Underlying model.

Return type:

Any

prepared#

GAMR.prepared#

Whether the model is prepared for fitting.

shape#

GAMR.shape#

Number of cells in adata.

Return type:

Tuple[int]

w#

GAMR.w#

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type:

ndarray

w_all#

GAMR.w_all#

Unfiltered weights of shape (n_cells,).

Return type:

ndarray

x#

GAMR.x#

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type:

ndarray

x_all#

GAMR.x_all#

Unfiltered independent variables of shape (n_cells, 1).

Return type:

ndarray

x_hat#

GAMR.x_hat#

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type:

ndarray

x_test#

GAMR.x_test#

Independent variables of shape (n_samples, 1) used for prediction.

Return type:

ndarray

y#

GAMR.y#

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type:

ndarray

y_all#

GAMR.y_all#

Unfiltered dependent variables of shape (n_cells, 1).

Return type:

ndarray

y_hat#

GAMR.y_hat#

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type:

ndarray

y_test#

GAMR.y_test#

Prediction values of shape (n_samples,) for x_test.

Return type:

ndarray

Methods#

confidence_interval#

GAMR.confidence_interval(x_test=None, level=0.95, **kwargs)[source]#

Calculate the confidence interval. Internally, this method calls cellrank.models.GAMR.predict() to extract the confidence interval, if needed.

Parameters:
Return type:

ndarray

Returns:

Updates and returns the following field:

  • conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

copy#

GAMR.copy()[source]#

Return a copy of self.

Return type:

GAMR

default_confidence_interval#

GAMR.default_confidence_interval(x_test=None, **kwargs)#

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from [DeSalvo, 1970], eq. 5.

Parameters:
Return type:

ndarray

Returns:

Updates and returns the following field:

  • conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Also update the following fields:

  • x_hat - Filtered independent variables used when calculating default confidence interval, usually same as x.

  • y_hat - Filtered dependent variables used when calculating default confidence interval, usually same as y.

fit#

GAMR.fit(x=None, y=None, w=None, **kwargs)[source]#

Fit the model.

Parameters:
  • x (Optional[ndarray]) – Independent variables, array of shape (n_samples, 1). If None, use x.

  • y (Optional[ndarray]) – Dependent variables, array of shape (n_samples, 1). If None, use y.

  • w (Optional[ndarray]) – Optional weights of x, array of shape (n_samples,). If None, use w.

  • kwargs – Keyword arguments for underlying model’s fitting function.

Return type:

GAMR

Returns:

Fits the model and returns self. Updates the following fields by filtering out 0 weights w:

  • x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

  • y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

  • w - Filtered weights of shape (n_filtered_cells,) used for fitting.

plot#

GAMR.plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)#

Plot the smoothed gene expression.

Parameters:
  • figsize (Tuple[float, float]) – Size of the figure.

  • same_plot (bool) – Whether to plot all trends in the same plot.

  • hide_cells (bool) – Whether to hide the cells.

  • perc (Optional[Tuple[float, float]]) – Percentile by which to clip the absorption probabilities.

  • abs_prob_cmap (ListedColormap) – Colormap to use when coloring in the absorption probabilities.

  • cell_color (Optional[str]) – Key in anndata.AnnData.obs or anndata.AnnData.var_names used for coloring the cells.

  • lineage_color (str) – Color for the lineage.

  • alpha (float) – Alpha channel for cells.

  • lineage_alpha (float) – Alpha channel for lineage confidence intervals.

  • title (Optional[str]) – Title of the plot.

  • size (int) – Size of the points.

  • lw (float) – Line width for the smoothed values.

  • cbar (bool) – Whether to show colorbar.

  • margins (float) – Margins around the plot.

  • xlabel (str) – Label on the x-axis.

  • ylabel (str) – Label on the y-axis.

  • conf_int (bool) – Whether to show the confidence interval.

  • lineage_probability (bool) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.

  • lineage_probability_conf_int (Union[bool, float]) – Whether to compute and show smoothed lineage probability confidence interval. If self is cellrank.models.GAMR, it can also specify the confidence level, the default is 0.95. Only used when show_lineage_probability=True.

  • lineage_probability_color (Optional[str]) – Color to use when plotting the smoothed lineage_probability. If None, it’s the same as lineage_color. Only used when show_lineage_probability=True.

  • obs_legend_loc (Optional[str]) – Location of the legend when cell_color corresponds to a categorical variable.

  • dpi (Optional[int]) – Dots per inch.

  • fig (Optional[Figure]) – Figure to use, if None, create a new one.

  • ax (matplotlib.axes.Axes) – Ax to use, if None, create a new one.

  • return_fig (bool) – If True, return the figure object.

  • save (Optional[str]) – Filename where to save the plot. If None, just shows the plots.

  • kwargs – Keyword arguments for matplotlib.axes.Axes.legend(), e.g. to disable the legend, specify loc=None. Only available when show_lineage_probability=True.

Return type:

Optional[Figure]

Returns:

Nothing, just plots the figure. Optionally saves it based on save.

predict#

GAMR.predict(x_test=None, key_added='_x_test', level=None, **kwargs)[source]#

Run the prediction. This method can also compute the confidence interval.

Parameters:
  • x_test (Optional[ndarray]) – Array of shape (n_samples,) used for prediction. If None, use x_test.

  • key_added (str) – Attribute name where to save the x_test for later use. If None, don’t save it.

  • kwargs – Keyword arguments for underlying model’s prediction method.

  • level (Optional[float]) – Confidence level for confidence interval calculation. If None, don’t compute the confidence interval. Must be in the interval [0, 1].

Return type:

ndarray

Returns:

Updates and returns the following field:

  • y_test - Prediction values of shape (n_samples,) for x_test.

prepare#

GAMR.prepare(*args, **kwargs)[source]#

Prepare the model to be ready for fitting. This also removes the zero and negative weights and prepares the design matrix.

Parameters:
  • gene – Gene in anndata.AnnData.var_names.

  • lineage – Name of the lineage. If None, all weights will be set to 1.

  • backward – Direction of the process.

  • time_range

    Specify start and end times:

    • If a tuple, it specifies the minimum and maximum pseudotime. Both values can be None, in which case the minimum is the earliest pseudotime and the maximum is automatically determined.

    • If a float, it specifies the maximum pseudotime.

  • data_key – Key in anndata.AnnData.layers or ‘X’ for anndata.AnnData.X. If use_raw = True, it’s always set to ‘X’.

  • time_key – Key in anndata.AnnData.obs where the pseudotime is stored.

  • use_raw – Whether to access anndata.AnnData.raw.

  • threshold – Consider only cells with weights > threshold when estimating the test endpoint. If None, use the median of the weights.

  • weight_threshold – Set all weights below weight_threshold to weight_threshold if a float, or to the second value, if a tuple.

  • filter_cells – Filter out all cells with expression values lower than this threshold.

  • n_test_points – Number of test points. If None, use the original points based on threshold.

Return type:

GAMR

Returns:

Nothing, just updates the following fields:

  • x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

  • y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

  • w - Filtered weights of shape (n_filtered_cells,) used for fitting.

  • x_all - Unfiltered independent variables of shape (n_cells, 1).

  • y_all - Unfiltered dependent variables of shape (n_cells, 1).

  • w_all - Unfiltered weights of shape (n_cells,).

  • x_test - Independent variables of shape (n_samples, 1) used for prediction.

  • prepared - Whether the model is prepared for fitting.

read#

static GAMR.read(fname, adata=None, copy=False)#

De-serialize self from a file.

Parameters:
  • fname (Union[str, Path]) – Filename from which to read the object.

  • adata (Optional[AnnData]) – anndata.AnnData object to assign to the saved object. Only used when the saved object has adata and it was saved without it.

  • copy (bool) – Whether to copy adata before assigning it or not. If adata is a view, it is always copied.

Return type:

IOMixin

Returns:

The de-serialized object.

write#

GAMR.write(fname, write_adata=True, ext='pickle')#

Serialize self to a file.

Parameters:
  • fname (Union[str, Path]) – Filename where to save the object.

  • write_adata (bool) – Whether to save adata object or not, if present.

  • ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Return type:

None

Returns:

Nothing, just writes itself to a file using pickle.