# Classes¶

## Estimators¶

### GPCCA¶

Generalized Perron Cluster Cluster Analysis [GPCCA18] as implemented in pyGPCCA.

Coarse-grains a discrete Markov chain into a set of macrostates and computes coarse-grained transition probabilities among the macrostates. Each macrostate corresponds to an area of the state space, i.e. to a subset of cells. The assignment is soft, i.e. each cell is assigned to every macrostate with a certain weight, where weights sum to one per cell. Macrostates are computed by maximizing the ‘crispness’ which can be thought of as a measure for minimal overlap between macrostates in a certain inner-product sense. Once the macrostates have been computed, we project the large transition matrix onto a coarse-grained transition matrix among the macrostates via a Galerkin projection. This projection is based on invariant subspaces of the original transition matrix which are obtained using the real Schur decomposition [GPCCA18].

Parameters
• obj (Union[KernelExpression, ~AnnData, spmatrix, ndarray]) – Either a cellrank.tl.kernels.Kernel object, an anndata.AnnData object which stores the transition matrix in .obsp attribute or numpy or scipy array.

• inplace (bool) – Whether to modify adata object inplace or make a copy.

• obsp_key (Optional[str]) – Key in obj.obsp when obj is an anndata.AnnData object.

• g2m_key (Optional[str]) – Key in adata .obs. Can be used to detect cell-cycle driven start- or endpoints.

• s_key (Optional[str]) – Key in adata .obs. Can be used to detect cell-cycle driven start- or endpoints.

• write_to_adata (bool) – Whether to write the transition matrix to adata .obsp and the parameters to adata .uns.

• key (Optional[str]) – Key used when writing transition matrix to adata. If None, the key is set to ‘T_bwd’ if backward is True, else ‘T_fwd’. Only used when write_to_adata=True.

compute_macrostates(n_states=None, n_cells=30, use_min_chi=False, cluster_key=None, en_cutoff=0.7, p_thresh=1e-15)[source]

Compute the macrostates.

Parameters
• n_states (Union[int, Tuple[int, int], List[int], Dict[str, int], None]) – Number of macrostates. If None, use the eigengap heuristic.

• n_cells (Optional[int]) – Number of most likely cells from each macrostate to select.

• use_min_chi (bool) – Whether to use pygpcca.GPCCA.minChi() to calculate the number of macrostates. If True, n_states corresponds to a closed interval [min, max] inside of which the potentially optimal number of macrostates is searched.

• cluster_key (Optional[str]) – If a key to cluster labels is given, names and colors of the states will be associated with the clusters.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labelled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (float) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

Returns

Nothing, but updates the following fields:

• macrostates_memberships

• macrostates

• schur

• coarse_T

• coarse_stationary_distribution

Return type

None

set_terminal_states_from_macrostates(names=None, n_cells=30)[source]

Manually select terminal states from macrostates.

Parameters
• names (Union[Sequence[str], Mapping[str, str], str, None]) – Names of the macrostates to be marked as terminal. Multiple states can be combined using ‘,’, such as ["Alpha, Beta", "Epsilon"]. If a dict, keys correspond to the names of the macrostates and the values to the new names. If None, select all macrostates.

• n_cells (int) – Number of most likely cells from each macrostate to select.

Returns

Nothing, just updates the following fields:

• terminal_states_probabilities

• terminal_states

Return type

None

compute_terminal_states(method='stability', n_cells=30, alpha=1, stability_threshold=0.96, n_states=None)[source]

Automatically select terminal states from macrostates.

Parameters
• method (str) –

One of following:

• ’eigengap’ - select the number of states based on the eigengap of the transition matrix.

• ’eigengap_coarse’ - select the number of states based on the eigengap of the diagonal of the coarse-grained transition matrix.

• ’top_n’ - select top n_states based on the probability of the diagonal of the coarse-grained transition matrix.

• ’stability’ - select states which have a stability index >= stability_threshold. The stability index is given by the diagonal elements of the coarse-grained transition matrix.

• n_cells (int) – Number of most likely cells from each macrostate to select.

• alpha (Optional[float]) – Weight given to the deviation of an eigenvalue from one. Used when method='eigengap' or method='eigengap_coarse'.

• stability_threshold (float) – Threshold used when method='stability'.

• n_states (Optional[int]) – Numer of states used when method='top_n'.

Returns

Nothing, just updates the following fields:

• terminal_states_probabilities

• terminal_states

Return type

None

Compute generalized Diffusion pseudotime from [Haghverdi16] using the real Schur decomposition.

Parameters
• n_components (int) – Number of real Schur vectors to consider.

• key_added (str) – Key in adata .obs where to save the pseudotime.

• kwargs – Keyword arguments for cellrank.tl.GPCCA.compute_schur() if Schur decomposition is not found.

Returns

Return type

None

plot_coarse_T(show_stationary_dist=True, show_initial_dist=False, cmap='viridis', xtick_rotation=45, annotate=True, show_cbar=True, title=None, figsize=(8, 8), dpi=80, save=None, text_kwargs=mappingproxy({}), **kwargs)[source]

Plot the coarse-grained transition matrix between macrostates.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

fit(n_lineages=None, cluster_key=None, keys=None, method='krylov', compute_absorption_probabilities=True, **kwargs)[source]

Run the pipeline, computing the macrostates, initial or terminal states and optionally the absorption probabilities.

It is equivalent to running:

if n_lineages is None or n_lineages == 1:
compute_eigendecomposition(...)  # get the stationary distribution
if n_lineages > 1:
compute_schur(...)

compute_macrostates(...)

if n_lineages is None:
compute_terminal_states(...)
else:
set_terminal_states_from_macrostates(...)

if compute_absorption_probabilities:
compute_absorption_probabilities(...)
Parameters
• n_lineages (Optional[int]) – Number of lineages. If None, it will be determined automatically.

• cluster_key (Optional[str]) – Match computed states against pre-computed clusters to annotate the states. For this, provide a key from adata .obs where cluster labels have been computed.

• keys (Optional[Sequence[str]]) – Determines which initial or terminaltates to use by passing their names. Further, initial or terminal states can be combined. If e.g. the terminal states are [‘Neuronal_1’, ‘Neuronal_1’, ‘Astrocytes’, ‘OPC’], then passing keys=['Neuronal_1, Neuronal_2', 'OPC'] means that the two neuronal terminal states are treated as one and the ‘Astrocyte’ state is excluded.

• method (str) – Method to use when computing the Schur decomposition. Valid options are: ‘krylov’ or ‘brandts’.

• compute_absorption_probabilities (bool) – Whether to compute the absorption probabilities or only the initial or terminal states.

• kwargs – Keyword arguments for cellrank.tl.estimators.GPCCA.compute_macrostates().

Returns

Nothing, just makes available the following fields:

• macrostates_memberships

• macrostates

• terminal_states_probabilities

• terminal_states

• absorption_probabilities

• diff_potential

Return type

None

property absorption_probabilities

Absorption probabilities.

Return type

Lineage

Annotated data object.

Returns

Annotated data object.

Return type

anndata.AnnData

property coarse_T

Coarse-grained transition matrix.

Return type

DataFrame

property coarse_initial_distribution

Coarse initial distribution.

Return type

Series

property coarse_stationary_distribution

Coarse stationary distribution.

Return type

Series

compute_absorption_probabilities(keys=None, check_irred=False, solver=None, use_petsc=None, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-05, preconditioner=None)

Compute absorption probabilities of a Markov chain.

For each cell, this computes the probability of it reaching any of the approximate recurrent classes defined by terminal_states. This also computes the entropy over absorption probabilities, which is a measure of cell plasticity, see [Setty19].

Parameters
• keys (Optional[Sequence[str]]) – Keys defining the recurrent classes.

• check_irred (bool) – Check whether the transition matrix is irreducible.

• solver (Optional[str]) –

Solver to use for the linear problem. Options are ‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’ or ‘gcrotmk’ when use_petsc=False or one of petsc4py.PETSc.KPS.Type otherwise.

Information on the scipy iterative solvers can be found in scipy.sparse.linalg() or for petsc4py solver here.

If is None, the solver is chosen automatically, depending on the problem size.

• use_petsc (Optional[bool]) – Whether to use solvers from petsc4py or scipy. Recommended for large problems. If None, it is determined automatically. If no installation is found, defaults to scipy.sparse.linalg.gmres().

• time_to_absorption (Union[str, Sequence[Union[str, Sequence[str]]], Dict[Union[str, Sequence[str]], str], None]) –

Whether to compute mean time to absorption and its variance to specific absorbing states.

If a dict, can be specified as {'Alpha': 'var', ...} to also compute variance. In case when states are a tuple, time to absorption will be computed to the subset of these states, such as [('Alpha', 'Beta'), ...] or {('Alpha', 'Beta'): 'mean', ...}. Can be specified as 'all' to compute it to any absorbing state in keys, which is more efficient than listing all absorbing states.

It might be beneficial to disable the progress bar as show_progress_bar=False, because many linear systems are being solved.

• n_jobs (Optional[int]) – Number of parallel jobs to use when using an iterative solver. When use_petsc=True or for quickly-solvable problems, we recommend higher number (>=8) of jobs in order to fully saturate the cores.

• backend (str) – Which backend to use for multiprocessing. See joblib.Parallel for valid options.

• show_progress_bar (bool) – Whether to show progress bar when the solver isn’t a direct one.

• tol (float) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.

• preconditioner (Optional[str]) – Preconditioner to use, only available when use_petsc=True. For available values, see here or the values of petsc4py.PETSc.PC.Type. We recommended ‘ilu’ preconditioner for badly conditioned problems.

Returns

Nothing, but updates the following fields:

• absorption_probabilities - probabilities of being absorbed into the terminal states.

• diff_potential - differentiation potential of cells.

• lineage_absorption_times - mean times until absorption to subset absorbing states and optionally their variances saved as '{lineage} mean' and '{lineage} var', respectively, for each subset of absorbing states specified in time_to_absorption.

Return type

None

compute_eigendecomposition(k=20, which='LR', alpha=1, only_evals=False, ncv=None)

Compute eigendecomposition of transition matrix.

Uses a sparse implementation, if possible, and only computes the top $$k$$ eigenvectors to speed up the computation. Computes both left and right eigenvectors.

Parameters
• k (int) – Number of eigenvalues/vectors to compute.

• which (str) – Eigenvalues are in general complex. ‘LR’ - largest real part, ‘LM’ - largest magnitude.

• alpha (float) – Used to compute the eigengap. alpha is the weight given to the deviation of an eigenvalue from one.

• only_evals (bool) – Compute only eigenvalues.

• ncv (Optional[int]) – Number of Lanczos vectors generated.

Returns

Nothing, but updates the following field:

• eigendecomposition

Return type

None

compute_lineage_drivers(lineages=None, method='fischer', cluster_key=None, clusters=None, layer='X', use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, return_drivers=True, **kwargs)

Compute driver genes per lineage.

Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.

Parameters
• lineages (Union[str, Sequence, None]) – Either a set of lineage names from absorption_probabilities .names or None, in which case all lineages are considered.

• method (str) –

Mode to use when calculating p-values and confidence intervals. Can be one of:

• ’fischer’ - use Fischer transformation [Fischer21].

• ’perm_test’ - use permutation test.

• cluster_key (Optional[str]) – Key from adata .obs to obtain cluster annotations. These are considered for clusters.

• clusters (Union[str, Sequence, None]) – Restrict the correlations to these clusters.

• layer (str) – Key from adata .layers.

• use_raw (bool) – Whether or not to use adata .raw to correlate gene expression. If using a layer other than .X, this must be set to False.

• confidence_level (float) – Confidence level for the confidence interval calculation. Must be in [0, 1].

• n_perms (int) – Number of permutations to use when method='perm_test'.

• seed (Optional[int]) – Random seed when method='perm_test'.

• return_drivers (bool) – Whether to return the drivers. This also contains the lower and upper confidence_level confidence interval bounds.

• show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.

• n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

• backend – Which backend to use for parallelization. See joblib.Parallel for valid options.

Return type

Optional[DataFrame]

Returns

• Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, 1 for each lineage:

• {lineage} corr - correlation between the gene expression and absorption probabilities.

• {lineage} pval - calulated p-values for double-sided test.

• {lineage} qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

• {lineage} ci low - lower bound of the confidence_level correlation confidence interval.

• {lineage} ci high - upper bound of the confidence_level correlation confidence interval.

Only if return_drivers=True.

• None

• '{direction} {lineage} corr' - the potential lineage drivers.

• '{direction} {lineage} qval' - the corrected p-values.

• lineage_drivers - same as the returned values.

References

Fischer21

Fisher, R. A. (1921), On the “probable error” of a coefficient of correlation deduced from a small sample., Metron 1 3–32.

compute_partition()

Compute communication classes for the Markov chain.

Returns

Nothing, but updates the following fields:

• recurrent_classes

• transient_classes

• is_irreducible

Return type

None

compute_schur(n_components=10, initial_distribution=None, method='krylov', which='LR', alpha=1)

Compute the Schur decomposition.

Parameters
• n_components (int) – Number of vectors to compute.

• initial_distribution (Optional[ndarray]) – Input probability distribution over all cells. If None, uniform is chosen.

• method (str) –

Method for calculating the Schur vectors. Valid options are: ‘krylov’ or ‘brandts’. For benefits of each method, see pygpcca.GPCCA.

The former is an iterative procedure that computes a partial, sorted Schur decomposition for large, sparse matrices whereas the latter computes a full sorted Schur decomposition of a dense matrix.

• which (str) – Eigenvalues are in general complex. ‘LR’ - largest real part, ‘LM’ - largest magnitude.

• alpha (float) – Used to compute the eigengap. alpha is the weight given to the deviation of an eigenvalue from one.

Returns

Nothing, but updates the following fields:

• schur

• schur_matrix

• eigendecomposition

Return type

None

copy()

Return a copy of self, including the underlying adata object.

Return type

BaseEstimator

property diff_potential

Differentiation potential.

Return type

Series

property eigendecomposition

Eigendecomposition.

Return type
property is_irreducible

Whether the Markov chain is irreducible or not.

property issparse

Whether the transition matrix is sparse or not.

Return type

bool

property kernel

Underlying kernel.

Return type

KernelExpression

property lineage_absorption_times

Lineage absorption times.

Return type

DataFrame

property lineage_drivers

Lineage drivers.

Return type

DataFrame

property macrostates

Macrostates.

Return type

Series

property macrostates_memberships

Macrostates memberships.

Return type

Lineage

plot_absorption_probabilities(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', show_dp=True, title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
• discrete (bool) – Whether to plot in discrete or continuous mode.

• lineages (Union[str, Sequence[str], None]) – Plot only these lineages. If None, plot all lineages.

• cluster_key (Optional[str]) – Key from adata .obs for plotting categorical observations.

• mode (str) –

Can be either ‘embedding’ or ‘time’:

• ’embedding’ - plot the embedding while coloring in the absorption probabilities.

• ’time’ - plot the pseudotime on x-axis and the absorption probabilities on y-axis.

• time_key (str) – Key from adata .obs to use as a pseudotime ordering of the cells.

• title (Optional[str]) – Either None, in which case titles are '{to, from} {terminal, initial} {state}', or an array of titles, one per lineage.

• same_plot (bool) – Whether to plot the lineages on the same plot using color gradients when mode='embedding'.

• cmap (Union[str, ListedColormap]) – Colormap to use.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_eigendecomposition(left=False, *args, **kwargs)

Plot eigenvectors in an embedding.

Parameters
• left (bool) – Whether to plot left or right eigenvectors.

• use – Which or how many vectors are to be plotted.

• abs_value – Whether to take the absolute value before plotting.

• cluster_key – Key in adata .obs for plotting categorical observations.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_lineage_drivers(lineage, n_genes=8, use_raw=False, **kwargs)

Plot lineage drivers discovered by compute_lineage_drivers().

Parameters
• lineage (str) – Lineage for which to plot the driver genes.

• n_genes (int) – Number of genes to plot.

• use_raw (bool) – Whether to look in adata .raw.var or adata .var.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_macrostates(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', show_dp=True, title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
• discrete (bool) – Whether to plot in discrete or continuous mode.

• lineages (Union[str, Sequence[str], None]) – Plot only these lineages. If None, plot all lineages.

• cluster_key (Optional[str]) – Key from adata .obs for plotting categorical observations.

• mode (str) –

Can be either ‘embedding’ or ‘time’:

• ’embedding’ - plot the embedding while coloring in the absorption probabilities.

• ’time’ - plot the pseudotime on x-axis and the absorption probabilities on y-axis.

• time_key (str) – Key from adata .obs to use as a pseudotime ordering of the cells.

• title (Optional[str]) – Either None, in which case titles are '{to, from} {terminal, initial} {state}', or an array of titles, one per lineage.

• same_plot (bool) – Whether to plot the lineages on the same plot using color gradients when mode='embedding'.

• cmap (Union[str, ListedColormap]) – Colormap to use.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_schur(vectors, prop, use=None, abs_value=False, cluster_key=None, **kwargs)

Plot vectors in an embedding.

Parameters
• use (Union[int, Tuple[int], List[int], None]) – Which or how many vectors are to be plotted.

• abs_value (bool) – Whether to take the absolute value before plotting.

• cluster_key (Optional[str]) – Key in adata .obs for plotting categorical observations.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_schur_matrix(title='schur matrix', cmap='viridis', figsize=None, dpi=80, save=None, **kwargs)

Plot the Schur matrix.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_spectrum(n=None, real_only=False, show_eigengap=True, show_all_xticks=True, legend_loc=None, title=None, figsize=(5, 5), dpi=100, save=None, marker='.', **kwargs)

Plot the top eigenvalues in real or complex plane.

Parameters
• n (Optional[int]) – Number of eigenvalues to show. If None, show all that have been computed.

• real_only (bool) – Whether to plot only the real part of the spectrum.

• show_eigengap (bool) – When real_only=True, this determines whether to show the inferred eigengap as a dotted line.

• show_all_xticks (bool) – When real_only=True, this determines whether to show the indices of all eigenvalues on the x-axis.

• legend_loc (Optional[str]) – Location parameter for the legend.

• title (Optional[str]) – Title of the figure.

• figsize (Optional[Tuple[float, float]]) – Size of the figure.

• dpi (int) – Dots per inch.

• save (Union[Path, str, None]) – Filename where to save the plot.

• marker (str) – Marker symbol used, valid options can be found in matplotlib.markers.

• kwargs – Keyword arguments for matplotlib.pyplot.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_terminal_states(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', show_dp=True, title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
• discrete (bool) – Whether to plot in discrete or continuous mode.

• lineages (Union[str, Sequence[str], None]) – Plot only these lineages. If None, plot all lineages.

• cluster_key (Optional[str]) – Key from adata .obs for plotting categorical observations.

• mode (str) –

Can be either ‘embedding’ or ‘time’:

• ’embedding’ - plot the embedding while coloring in the absorption probabilities.

• ’time’ - plot the pseudotime on x-axis and the absorption probabilities on y-axis.

• time_key (str) – Key from adata .obs to use as a pseudotime ordering of the cells.

• title (Optional[str]) – Either None, in which case titles are '{to, from} {terminal, initial} {state}', or an array of titles, one per lineage.

• same_plot (bool) – Whether to plot the lineages on the same plot using color gradients when mode='embedding'.

• cmap (Union[str, ListedColormap]) – Colormap to use.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property recurrent_classes

Recurrent classes of the Markov chain.

Rename the names of terminal_states.

Parameters
• new_names (Mapping[str, str]) – Mapping where keys are the old names and the values are the new names. New names must be unique.

• update_adata (bool) – Whether to update underlying adata object as well or not.

Returns

Nothing, just updates the names of terminal_states.

Return type

None

property schur

Schur vectors.

Return type

ndarray

property schur_matrix

Schur matrix.

Return type

ndarray

set_terminal_states(labels, cluster_key=None, en_cutoff=None, p_thresh=None, add_to_existing=False, **kwargs)

Set the approximate recurrent classes, if they are known a priori.

Parameters
• labels (Union[Series, Dict[str, Any]]) – Either a categorical pandas.Series with index as cell names, where NaN marks marks a cell belonging to a transient state or a dict, where each key is the name of the recurrent class and values are list of cell names.

• cluster_key (Optional[str]) – If a key to cluster labels is given, terminal_states will ge associated with these for naming and colors.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labelled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (Optional[float]) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

• add_to_existing (bool) – Whether to add thses categories to existing ones. Cells already belonging to recurrent classes will be updated if there’s an overlap. Throws an error if previous approximate recurrent classes have not been calculated.

Returns

Nothing, but updates the following fields:

• terminal_states_probabilities

• terminal_states

Return type

None

property terminal_states

Terminal states.

Return type

Series

property terminal_states_probabilities

Terminal states probabilities.

Return type

Series

property transient_classes

Transient classes of the Markov chain.

property transition_matrix

Transition matrix.

Return type

Union[ndarray, spmatrix]

write(fname, ext='pickle')

Serialize self to a file.

Parameters
• fname (Union[str, Path]) – Filename where to save the object.

• ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Returns

Nothing, just writes itself to a file using pickle.

Return type

None

### CFLARE¶

Compute the initial/terminal states of a Markov chain via spectral heuristics.

This estimator uses the left eigenvectors of the transition matrix to filter to a set of recurrent cells and the right eigenvectors to cluster this set of cells into discrete groups.

Parameters
• obj (Union[KernelExpression, ~AnnData, spmatrix, ndarray]) – Either a cellrank.tl.kernels.Kernel object, an anndata.AnnData object which stores the transition matrix in .obsp attribute or numpy or scipy array.

• inplace (bool) – Whether to modify adata object inplace or make a copy.

• obsp_key (Optional[str]) – Key in obj.obsp when obj is an anndata.AnnData object.

• g2m_key (Optional[str]) – Key in adata .obs. Can be used to detect cell-cycle driven start- or endpoints.

• s_key (Optional[str]) – Key in adata .obs. Can be used to detect cell-cycle driven start- or endpoints.

• write_to_adata (bool) – Whether to write the transition matrix to adata .obsp and the parameters to adata .uns.

• key (Optional[str]) – Key used when writing transition matrix to adata. If None, the key is set to ‘T_bwd’ if backward is True, else ‘T_fwd’. Only used when write_to_adata=True.

compute_terminal_states(use=None, percentile=98, method='kmeans', cluster_key=None, n_clusters_kmeans=None, n_neighbors=20, resolution=0.1, n_matches_min=0, n_neighbors_filtering=15, basis=None, n_comps=5, scale=False, en_cutoff=0.7, p_thresh=1e-15)[source]

Find approximate recurrent classes of the Markov chain.

Filter to obtain recurrent states in left eigenvectors. Cluster to obtain approximate recurrent classes in right eigenvectors.

Parameters
• use (Union[int, Tuple[int], List[int], range, None]) – Which or how many first eigenvectors to use as features for clustering/filtering. If None, use the eigengap statistic.

• percentile (Optional[int]) – Threshold used for filtering out cells which are most likely transient states. Cells which are in the lower percentile percent of each eigenvector will be removed from the data matrix.

• method (str) – Method to be used for clustering. Must be one of ‘louvain’, ‘leiden’ or ‘kmeans’.

• cluster_key (Optional[str]) – If a key to cluster labels is given, terminal_states will get associated with these for naming and colors.

• n_clusters_kmeans (Optional[int]) – If None, this is set to use + 1.

• n_neighbors (int) – If we use ‘louvain’ or ‘leiden’ for clustering cells, we need to build a KNN graph. This is the $$K$$ parameter for that, the number of neighbors for each cell.

• resolution (float) – Resolution parameter for ‘louvain’ or ‘leiden’ clustering. Should be chosen relatively small.

• n_matches_min (Optional[int]) – Filters out cells which don’t have at least n_matches_min neighbors from the same class. This filters out some cells which are transient but have been misassigned.

• n_neighbors_filtering (int) – Parameter for filtering cells. Cells are filtered out if they don’t have at least n_matches_min neighbors among their n_neighbors_filtering nearest cells.

• basis (Optional[str]) – Key from :paramrefadata .obsm to be used as additional features for the clustering.

• n_comps (int) – Number of embedding components to be use when basis is not None.

• scale (bool) – Scale to z-scores. Consider using this if appending embedding to features.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labelled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (float) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

Returns

Nothing, but updates the following fields:

• terminal_states_probabilities

• terminal_states

Return type

None

fit(n_lineages, keys=None, cluster_key=None, compute_absorption_probabilities=True, **kwargs)[source]

Run the pipeline, computing the initial or terminal states and optionally the absorption probabilities.

It is equivalent to running:

compute_eigendecomposition(...)
compute_terminal_states(...)
compute_absorption_probabilities(...)
Parameters
• n_lineages (Optional[int]) – Number of lineages. If None, it will be determined automatically.

• cluster_key (Optional[str]) – Match computed states against pre-computed clusters to annotate the states. For this, provide a key from adata .obs where cluster labels have been computed.

• keys (Optional[Sequence[str]]) – Determines which initial or terminaltates to use by passing their names. Further, initial or terminal states can be combined. If e.g. the terminal states are [‘Neuronal_1’, ‘Neuronal_1’, ‘Astrocytes’, ‘OPC’], then passing keys=['Neuronal_1, Neuronal_2', 'OPC'] means that the two neuronal terminal states are treated as one and the ‘Astrocyte’ state is excluded.

• kwargs – Keyword arguments for compute_terminal_states(), such as n_cells.

Returns

Nothing, just makes available the following fields:

• terminal_states_probabilities

• terminal_states

• absorption_probabilities

• diff_potential

Return type

None

property absorption_probabilities

Absorption probabilities.

Return type

Lineage

Annotated data object.

Returns

Annotated data object.

Return type

anndata.AnnData

compute_absorption_probabilities(keys=None, check_irred=False, solver=None, use_petsc=None, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-05, preconditioner=None)

Compute absorption probabilities of a Markov chain.

For each cell, this computes the probability of it reaching any of the approximate recurrent classes defined by terminal_states. This also computes the entropy over absorption probabilities, which is a measure of cell plasticity, see [Setty19].

Parameters
• keys (Optional[Sequence[str]]) – Keys defining the recurrent classes.

• check_irred (bool) – Check whether the transition matrix is irreducible.

• solver (Optional[str]) –

Solver to use for the linear problem. Options are ‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’ or ‘gcrotmk’ when use_petsc=False or one of petsc4py.PETSc.KPS.Type otherwise.

Information on the scipy iterative solvers can be found in scipy.sparse.linalg() or for petsc4py solver here.

If is None, the solver is chosen automatically, depending on the problem size.

• use_petsc (Optional[bool]) – Whether to use solvers from petsc4py or scipy. Recommended for large problems. If None, it is determined automatically. If no installation is found, defaults to scipy.sparse.linalg.gmres().

• time_to_absorption (Union[str, Sequence[Union[str, Sequence[str]]], Dict[Union[str, Sequence[str]], str], None]) –

Whether to compute mean time to absorption and its variance to specific absorbing states.

If a dict, can be specified as {'Alpha': 'var', ...} to also compute variance. In case when states are a tuple, time to absorption will be computed to the subset of these states, such as [('Alpha', 'Beta'), ...] or {('Alpha', 'Beta'): 'mean', ...}. Can be specified as 'all' to compute it to any absorbing state in keys, which is more efficient than listing all absorbing states.

It might be beneficial to disable the progress bar as show_progress_bar=False, because many linear systems are being solved.

• n_jobs (Optional[int]) – Number of parallel jobs to use when using an iterative solver. When use_petsc=True or for quickly-solvable problems, we recommend higher number (>=8) of jobs in order to fully saturate the cores.

• backend (str) – Which backend to use for multiprocessing. See joblib.Parallel for valid options.

• show_progress_bar (bool) – Whether to show progress bar when the solver isn’t a direct one.

• tol (float) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.

• preconditioner (Optional[str]) – Preconditioner to use, only available when use_petsc=True. For available values, see here or the values of petsc4py.PETSc.PC.Type. We recommended ‘ilu’ preconditioner for badly conditioned problems.

Returns

Nothing, but updates the following fields:

• absorption_probabilities - probabilities of being absorbed into the terminal states.

• diff_potential - differentiation potential of cells.

• lineage_absorption_times - mean times until absorption to subset absorbing states and optionally their variances saved as '{lineage} mean' and '{lineage} var', respectively, for each subset of absorbing states specified in time_to_absorption.

Return type

None

compute_eigendecomposition(k=20, which='LR', alpha=1, only_evals=False, ncv=None)

Compute eigendecomposition of transition matrix.

Uses a sparse implementation, if possible, and only computes the top $$k$$ eigenvectors to speed up the computation. Computes both left and right eigenvectors.

Parameters
• k (int) – Number of eigenvalues/vectors to compute.

• which (str) – Eigenvalues are in general complex. ‘LR’ - largest real part, ‘LM’ - largest magnitude.

• alpha (float) – Used to compute the eigengap. alpha is the weight given to the deviation of an eigenvalue from one.

• only_evals (bool) – Compute only eigenvalues.

• ncv (Optional[int]) – Number of Lanczos vectors generated.

Returns

Nothing, but updates the following field:

• eigendecomposition

Return type

None

compute_lineage_drivers(lineages=None, method='fischer', cluster_key=None, clusters=None, layer='X', use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, return_drivers=True, **kwargs)

Compute driver genes per lineage.

Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.

Parameters
• lineages (Union[str, Sequence, None]) – Either a set of lineage names from absorption_probabilities .names or None, in which case all lineages are considered.

• method (str) –

Mode to use when calculating p-values and confidence intervals. Can be one of:

• ’fischer’ - use Fischer transformation [Fischer21].

• ’perm_test’ - use permutation test.

• cluster_key (Optional[str]) – Key from adata .obs to obtain cluster annotations. These are considered for clusters.

• clusters (Union[str, Sequence, None]) – Restrict the correlations to these clusters.

• layer (str) – Key from adata .layers.

• use_raw (bool) – Whether or not to use adata .raw to correlate gene expression. If using a layer other than .X, this must be set to False.

• confidence_level (float) – Confidence level for the confidence interval calculation. Must be in [0, 1].

• n_perms (int) – Number of permutations to use when method='perm_test'.

• seed (Optional[int]) – Random seed when method='perm_test'.

• return_drivers (bool) – Whether to return the drivers. This also contains the lower and upper confidence_level confidence interval bounds.

• show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.

• n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

• backend – Which backend to use for parallelization. See joblib.Parallel for valid options.

Return type

Optional[DataFrame]

Returns

• Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, 1 for each lineage:

• {lineage} corr - correlation between the gene expression and absorption probabilities.

• {lineage} pval - calulated p-values for double-sided test.

• {lineage} qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

• {lineage} ci low - lower bound of the confidence_level correlation confidence interval.

• {lineage} ci high - upper bound of the confidence_level correlation confidence interval.

Only if return_drivers=True.

• None

• '{direction} {lineage} corr' - the potential lineage drivers.

• '{direction} {lineage} qval' - the corrected p-values.

• lineage_drivers - same as the returned values.

References

Fischer21

Fisher, R. A. (1921), On the “probable error” of a coefficient of correlation deduced from a small sample., Metron 1 3–32.

compute_partition()

Compute communication classes for the Markov chain.

Returns

Nothing, but updates the following fields:

• recurrent_classes

• transient_classes

• is_irreducible

Return type

None

copy()

Return a copy of self, including the underlying adata object.

Return type

BaseEstimator

property diff_potential

Differentiation potential.

Return type

Series

property eigendecomposition

Eigendecomposition.

Return type
property is_irreducible

Whether the Markov chain is irreducible or not.

property issparse

Whether the transition matrix is sparse or not.

Return type

bool

property kernel

Underlying kernel.

Return type

KernelExpression

property lineage_absorption_times

Lineage absorption times.

Return type

DataFrame

property lineage_drivers

Lineage drivers.

Return type

DataFrame

plot_absorption_probabilities(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', show_dp=True, title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
• discrete (bool) – Whether to plot in discrete or continuous mode.

• lineages (Union[str, Sequence[str], None]) – Plot only these lineages. If None, plot all lineages.

• cluster_key (Optional[str]) – Key from adata .obs for plotting categorical observations.

• mode (str) –

Can be either ‘embedding’ or ‘time’:

• ’embedding’ - plot the embedding while coloring in the absorption probabilities.

• ’time’ - plot the pseudotime on x-axis and the absorption probabilities on y-axis.

• time_key (str) – Key from adata .obs to use as a pseudotime ordering of the cells.

• title (Optional[str]) – Either None, in which case titles are '{to, from} {terminal, initial} {state}', or an array of titles, one per lineage.

• same_plot (bool) – Whether to plot the lineages on the same plot using color gradients when mode='embedding'.

• cmap (Union[str, ListedColormap]) – Colormap to use.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_eigendecomposition(left=False, *args, **kwargs)

Plot eigenvectors in an embedding.

Parameters
• left (bool) – Whether to plot left or right eigenvectors.

• use – Which or how many vectors are to be plotted.

• abs_value – Whether to take the absolute value before plotting.

• cluster_key – Key in adata .obs for plotting categorical observations.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_lineage_drivers(lineage, n_genes=8, use_raw=False, **kwargs)

Plot lineage drivers discovered by compute_lineage_drivers().

Parameters
• lineage (str) – Lineage for which to plot the driver genes.

• n_genes (int) – Number of genes to plot.

• use_raw (bool) – Whether to look in adata .raw.var or adata .var.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_spectrum(n=None, real_only=False, show_eigengap=True, show_all_xticks=True, legend_loc=None, title=None, figsize=(5, 5), dpi=100, save=None, marker='.', **kwargs)

Plot the top eigenvalues in real or complex plane.

Parameters
• n (Optional[int]) – Number of eigenvalues to show. If None, show all that have been computed.

• real_only (bool) – Whether to plot only the real part of the spectrum.

• show_eigengap (bool) – When real_only=True, this determines whether to show the inferred eigengap as a dotted line.

• show_all_xticks (bool) – When real_only=True, this determines whether to show the indices of all eigenvalues on the x-axis.

• legend_loc (Optional[str]) – Location parameter for the legend.

• title (Optional[str]) – Title of the figure.

• figsize (Optional[Tuple[float, float]]) – Size of the figure.

• dpi (int) – Dots per inch.

• save (Union[Path, str, None]) – Filename where to save the plot.

• marker (str) – Marker symbol used, valid options can be found in matplotlib.markers.

• kwargs – Keyword arguments for matplotlib.pyplot.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_terminal_states(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', show_dp=True, title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
• discrete (bool) – Whether to plot in discrete or continuous mode.

• lineages (Union[str, Sequence[str], None]) – Plot only these lineages. If None, plot all lineages.

• cluster_key (Optional[str]) – Key from adata .obs for plotting categorical observations.

• mode (str) –

Can be either ‘embedding’ or ‘time’:

• ’embedding’ - plot the embedding while coloring in the absorption probabilities.

• ’time’ - plot the pseudotime on x-axis and the absorption probabilities on y-axis.

• time_key (str) – Key from adata .obs to use as a pseudotime ordering of the cells.

• title (Optional[str]) – Either None, in which case titles are '{to, from} {terminal, initial} {state}', or an array of titles, one per lineage.

• same_plot (bool) – Whether to plot the lineages on the same plot using color gradients when mode='embedding'.

• cmap (Union[str, ListedColormap]) – Colormap to use.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property recurrent_classes

Recurrent classes of the Markov chain.

Rename the names of terminal_states.

Parameters
• new_names (Mapping[str, str]) – Mapping where keys are the old names and the values are the new names. New names must be unique.

• update_adata (bool) – Whether to update underlying adata object as well or not.

Returns

Nothing, just updates the names of terminal_states.

Return type

None

set_terminal_states(labels, cluster_key=None, en_cutoff=None, p_thresh=None, add_to_existing=False, **kwargs)

Set the approximate recurrent classes, if they are known a priori.

Parameters
• labels (Union[Series, Dict[str, Any]]) – Either a categorical pandas.Series with index as cell names, where NaN marks marks a cell belonging to a transient state or a dict, where each key is the name of the recurrent class and values are list of cell names.

• cluster_key (Optional[str]) – If a key to cluster labels is given, terminal_states will ge associated with these for naming and colors.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labelled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (Optional[float]) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

• add_to_existing (bool) – Whether to add thses categories to existing ones. Cells already belonging to recurrent classes will be updated if there’s an overlap. Throws an error if previous approximate recurrent classes have not been calculated.

Returns

Nothing, but updates the following fields:

• terminal_states_probabilities

• terminal_states

Return type

None

property terminal_states

Terminal states.

Return type

Series

property terminal_states_probabilities

Terminal states probabilities.

Return type

Series

property transient_classes

Transient classes of the Markov chain.

property transition_matrix

Transition matrix.

Return type

Union[ndarray, spmatrix]

write(fname, ext='pickle')

Serialize self to a file.

Parameters
• fname (Union[str, Path]) – Filename where to save the object.

• ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Returns

Nothing, just writes itself to a file using pickle.

Return type

None

## Kernels¶

### Velocity Kernel¶

class cellrank.tl.kernels.VelocityKernel(adata, backward=False, vkey='velocity', xkey='Ms', gene_subset=None, compute_cond_num=False, check_connectivity=False)[source]

Kernel which computes a transition matrix based on RNA velocity.

This borrows ideas from both [Manno18] and [Bergen20]. In short, for each cell i, we compute transition probabilities $$p_{i, j}$$ to each cell j in the neighborhood of i. The transition probabilities are computed as a multinomial logistic regression where the weights $$w_j$$ (for all j) are given by the vector that connects cell i with cell j in gene expression space, and the features $$x_i$$ are given by the velocity vector $$v_i$$ of cell i.

Parameters
• adata (anndata.AnnData) – Annotated data object.

• backward (bool) – Direction of the process.

• vkey (str) – Key in adata .uns where the velocities are stored.

• xkey (str) – Key in adata .layers where expected gene expression counts are stored.

• gene_subset (Optional[Iterable]) – List of genes to be used to compute transition probabilities. By default, genes from adata .var['velocity_genes'] are used.

• compute_cond_num (bool) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.

• check_connectivity (bool) – Check whether the underlying KNN graph is connected.

compute_transition_matrix(mode='deterministic', backward_mode='transpose', scheme='correlation', softmax_scale=None, n_samples=1000, seed=None, **kwargs)[source]

Compute transition matrix based on velocity directions on the local manifold.

For each cell, infer transition probabilities based on the cell’s velocity-extrapolated cell state and the cell states of its K nearest neighbors.

Parameters
• mode (str) –

How to compute transition probabilities. Valid options are:

• ’deterministic’ - deterministic computation that doesn’t propagate uncertainty.

• ’monte_carlo’ - Monte Carlo average of randomly sampled velocity vectors.

• ’stochastic’ - second order approximation, only available when jax is installed.

• ’sampling’ - sample 1 transition matrix from the velocity distribution.

• backward_mode (str) –

Only matters if initialized as backward =True. Valid options are:

• ’transpose’ - compute transitions from neighboring cells j to cell i.

• ’negate’ - negate the velocity vector.

• softmax_scale (Optional[float]) – Scaling parameter for the softmax. If None, it will be estimated using 1 / median(correlations). The idea behind this is to scale the softmax to counteract everything tending to orthogonality in high dimensions.

• scheme (Union[str, Callable]) –

Similarity scheme between cells as described in [Li2020]. Can be one of the following:

Alternatively, any function can be passed as long as it follows the call signature of cellrank.tl.kernels.SimilaritySchemeABC.

• n_samples (int) – Number of bootstrap samples when mode='monte_carlo'.

• seed (Optional[int]) – Set the seed for random state when the method requires n_samples.

• show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.

• n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

• backend – Which backend to use for parallelization. See joblib.Parallel for valid options.

Returns

Makes available the following fields:

• transition_matrix.

• logits.

Return type

cellrank.tl.kernels.VelocityKernel

property logits

Array of shape (n_cells, n_cells) containing the logits.

Return type

csr_matrix

copy()[source]

Return a copy of self.

Return type

VelocityKernel

#### Cosine similarity scheme¶

class cellrank.tl.kernels.CosineScheme[source]

Cosine similarity scheme as defined in eq. (4.7) of [Li2020].

$$v(s_i, s_j) = g(cos(\delta_{i, j}, v_i))$$

where $$v_i$$ is the velocity vector of cell $$i$$, $$\delta_{i, j}$$ corresponds to the transcriptional displacement between cells $$i$$ and $$j$$ and $$g$$ is a softmax function with some scaling parameter.

#### Correlation scheme¶

class cellrank.tl.kernels.CorrelationScheme[source]

Pearson correlation scheme as defined in eq. (4.8) of [Li2020].

$$v(s_i, s_j) = g(corr(\delta_{i, j}, v_i))$$

where $$v_i$$ is the velocity vector of cell $$i$$, $$\delta_{i, j}$$ corresponds to the transcriptional displacement between cells $$i$$ and $$j$$ and $$g$$ is a softmax function with some scaling parameter.

#### Dot product scheme¶

class cellrank.tl.kernels.DotProductScheme[source]

Dot product scheme as defined in eq. (4.9) of [Li2020].

$$v(s_i, s_j) = g(\delta_{i, j}^T v_i)$$

where $$v_i$$ is the velocity vector of cell $$i$$, $$\delta_{i, j}$$ corresponds to the transcriptional displacement between cells $$i$$ and $$j$$ and $$g$$ is a softmax function with some scaling parameter.

### Connectivity Kernel¶

class cellrank.tl.kernels.ConnectivityKernel(adata, backward=False, conn_key='connectivities', compute_cond_num=False, check_connectivity=False)[source]

Kernel which computes transition probabilities based on similarities among cells.

As a measure of similarity, we currently support:

The resulting transition matrix is symmetric and thus cannot be used to learn about the direction of the biological process. To include this direction, consider combining with a velocity-derived transition matrix via cellrank.tl.kernels.VelocityKernel.

Optionally, we apply a density correction as described in [Coifman05], where we use the implementation of [Haghverdi16].

Parameters
• adata (anndata.AnnData) – Annotated data object.

• backward (bool) – Direction of the process.

• conn_key (str) – Key in anndata.AnnData.obsp to obtain the connectivity matrix, describing cell-cell similarity.

• compute_cond_num (bool) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.

• check_connectivity (bool) – Check whether the underlying KNN graph is connected.

compute_transition_matrix(density_normalize=True)[source]

Compute transition matrix based on transcriptomic similarity.

Uses symmetric, weighted KNN graph to compute symmetric transition matrix. The connectivities are computed using scanpy.pp.neighbors(). Depending on the parameters used there, they can be UMAP connectivities or gaussian-kernel-based connectivities with adaptive kernel width.

Parameters

density_normalize (bool) – Whether or not to use the underlying KNN graph for density normalization.

Returns

Makes transition_matrix available.

Return type

cellrank.tl.kernels.ConnectivityKernel

copy()[source]

Return a copy of self.

Return type

ConnectivityKernel

### Palantir Kernel¶

class cellrank.tl.kernels.PalantirKernel(adata, backward=False, time_key='dpt_pseudotime', compute_cond_num=False, check_connectivity=False)[source]

Kernel which computes transition probabilities in a similar way to Palantir, see [Setty19].

Palantir computes a KNN graph in gene expression space and a pseudotime, which it then uses to direct the edges of the KNN graph, such that they are more likely to point into the direction of increasing pseudotime. To avoid disconnecting the graph, it does not remove all edges that point into the direction of decreasing pseudotime but keeps the ones that point to nodes inside a close radius. This radius is chosen according to the local density.

The implementation presented here won’t exactly reproduce the original Palantir algorithm (see below) but the results are qualitatively very similar.

Optionally, we apply a density correction as described in [Coifman05], where we use the implementation of [Haghverdi16].

Parameters
• adata (anndata.AnnData) – Annotated data object.

• backward (bool) – Direction of the process.

• time_key (str) – Key in adata .obs where the pseudotime is stored.

• compute_cond_num (bool) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.

compute_transition_matrix(k=3, density_normalize=True)[source]

Compute transition matrix based on KNN graph and pseudotemporal ordering.

This is a re-implementation of the Palantir algorithm by [Setty19]. Note that this won’t exactly reproduce the original Palantir results, for three reasons:

• Palantir computes the KNN graph in a scaled space of diffusion components.

• Palantir uses its own pseudotime to bias the KNN graph which is not implemented here.

• Palantir uses a slightly different mechanism to ensure the graph remains connected when removing edges that point into the “pseudotime past”.

If you would like to reproduce the original results, please use the original Palantir algorithm.

Parameters
• k (int) – Number of neighbors to keep for each node, regardless of pseudotime. This is done to ensure that the graph remains connected.

• density_normalize (bool) – Whether or not to use the underlying KNN graph for density normalization.

Returns

Makes transition_matrix available.

Return type

cellrank.tl.kernels.PalantirKernel

property pseudotime

Pseudotemporal ordering of cells.

Return type

array

copy()[source]

Return a copy of self.

Return type

PalantirKernel

### Precomputed Kernel¶

Kernel which contains a precomputed transition matrix.

Parameters
• transition_matrix (Union[ndarray, spmatrix, KernelExpression, str, None]) – Row-normalized transition matrix or a key in adata .obsp or a cellrank.tl.kernels.KernelExpression with the computed transition matrix. If None, try to determine the key based on backward.

• adata (anndata.AnnData) – Annotated data object.

• backward (bool) – Direction of the process.

copy()[source]

Return a copy of self.

Return type

PrecomputedKernel

compute_transition_matrix(*args, **kwargs)[source]

Return self.

Return type

PrecomputedKernel

## Models¶

### GAM¶

Fit Generalized Additive Models (GAMs) using pygam.

Parameters
• adata (anndata.AnnData) – Annotated data object.

• n_knots (Optional[int]) – Number of knots.

• spline_order (int) – Order of the splines, i.e. 3 for cubic splines.

• distribution (str) – Name of the distribution. Available distributions can be found here.

• max_iter (int) – Maximum number of iterations for optimization.

• expectile (Optional[float]) – Expectile for pygam.pygam.ExpectileGAM. This forces the distribution to be ‘normal’ and link function to ‘identity’. Must be in interval (0, 1).

• grid (Union[str, Mapping, None]) – Whether to perform a grid search. Keys correspond to a parameter names and values to range to be searched. If ‘default’, use the default grid. If None, don’t perform a grid search.

• spline_kwargs (Mapping) – Keyword arguments for pygam.s.

• kwargs – Keyword arguments for pygam.pygam.GAM.

fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters
• x (Optional[ndarray]) – Independent variables, array of shape (n_samples, 1). If None, use x.

• y (Optional[ndarray]) – Dependent variables, array of shape (n_samples, 1). If None, use y.

• w (Optional[ndarray]) – Optional weights of x, array of shape (n_samples,). If None, use w.

• kwargs – Keyword arguments for underlying model’s fitting function.

Returns

Fits the model and returns self.

Return type

cellrank.ul.models.GAM

Run the prediction.

Parameters
• x_test (Optional[ndarray]) – Array of shape (n_samples,) used for prediction. If None, use x_test.

• key_added (Optional[str]) – Attribute name where to save the x_test for later use. If None, don’t save it.

• kwargs – Keyword arguments for underlying model’s prediction method.

Returns

• y_test - Prediction values of shape (n_samples,) for x_test.

Return type

numpy.ndarray

Annotated data object.

Returns

Return type

anndata.AnnData

property conf_int

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

ndarray

confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

numpy.ndarray

default_confidence_interval(x_test=None, **kwargs)

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from [DeSalvo70], eq. 5.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

• x_hat - Filtered independent variables used when calculating default confidence interval, usually same as x.

• y_hat - Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

numpy.ndarray

References

DeSalvo70

DeSalvo, J. S. (1970), Standard Error of Forecast in Multiple Regression: Proof of a Useful Result., RAND Corporation.

property model

The underlying model.

Return type

Any

plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)

Plot the smoothed gene expression.

Parameters
• figsize (Tuple[float, float]) – Size of the figure.

• same_plot (bool) – Whether to plot all trends in the same plot.

• hide_cells (bool) – Whether to hide the cells.

• perc (Optional[Tuple[float, float]]) – Percentile by which to clip the absorption probabilities.

• abs_prob_cmap (ListedColormap) – Colormap to use when coloring in the absorption probabilities.

• cell_color (Optional[str]) – Key in anndata.AnnData.obs or anndata.AnnData.var_names used for coloring the cells.

• lineage_color (str) – Color for the lineage.

• alpha (float) – Alpha channel for cells.

• lineage_alpha (float) – Alpha channel for lineage confidence intervals.

• title (Optional[str]) – Title of the plot.

• size (int) – Size of the points.

• lw (float) – Line width for the smoothed values.

• cbar (bool) – Whether to show colorbar.

• margins (float) – Margins around the plot.

• xlabel (str) – Label on the x-axis.

• ylabel (str) – Label on the y-axis.

• conf_int (bool) – Whether to show the confidence interval.

• lineage_probability (bool) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.

• lineage_probability_conf_int (Union[bool, float]) – Whether to compute and show smoothed lineage probability confidence interval. If self is cellrank.ul.models.GAMR, it can also specify the confidence level, the default is 0.95. Only used when show_lineage_probability=True.

• lineage_probability_color (Optional[str]) – Color to use when plotting the smoothed lineage_probability. If None, it’s the same as lineage_color. Only used when show_lineage_probability=True.

• obs_legend_loc (Optional[str]) – Location of the legend when cell_color corresponds to a categorical variable.

• dpi (Optional[int]) – Dots per inch.

• fig (Optional[Figure]) – Figure to use, if None, create a new one.

• ax (matplotlib.axes.Axes) – Ax to use, if None, create a new one.

• return_fig (bool) – If True, return the figure object.

• save (Optional[str]) – Filename where to save the plot. If None, just shows the plots.

• kwargs – Keyword arguments for matplotlib.axes.Axes.legend(), e.g. to disable the legend, specify loc=None. Only available when show_lineage_probability=True.

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)

Prepare the model to be ready for fitting.

Parameters
• gene (str) – Gene in adata .var_names or in adata .raw.var_names.

• lineage (Optional[str]) – Name of a lineage in adata .obsm[lineage_key]. If None, all weights will be set to 1.

• backward (bool) – Direction of the process.

• time_range (Union[float, Tuple[float, float], None]) –

Specify start and end times:

• If a tuple, it specifies the minimum and maximum pseudotime. Both values can be None, in which case the minimum is the earliest pseudotime and the maximum is automatically determined.

• If a float, it specifies the maximum pseudotime.

• data_key (str) – Key in adata .layers or ‘X’ for adata .X. If use_raw=True, it’s always set to ‘X’.

• time_key (str) – Key in adata .obs where the pseudotime is stored.

• use_raw (bool) – Whether to access adata .raw or not.

• threshold (Optional[float]) – Consider only cells with weights > threshold when estimating the test endpoint. If None, use the median of the weights.

• weight_threshold (Union[float, Tuple[float, float]]) – Set all weights below weight_threshold to weight_threshold if a float, or to the second value, if a tuple.

• filter_cells (Optional[float]) – Filter out all cells with expression values lower than this threshold.

• n_test_points (int) – Number of test points. If None, use the original points based on threshold.

Returns

Nothing, but updates the following fields:

• x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

• y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

• w - Filtered weights of shape (n_filtered_cells,) used for fitting.

• x_all - Unfiltered independent variables of shape (n_cells, 1).

• y_all - Unfiltered dependent variables of shape (n_cells, 1).

• w_all - Unfiltered weights of shape (n_cells,).

• x_test - Independent variables of shape (n_samples, 1) used for prediction.

• prepared - Whether the model is prepared for fitting.

Return type

None

property prepared

Whether the model is prepared for fitting.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property w

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

ndarray

property w_all

Unfiltered weights of shape (n_cells,).

Return type

ndarray

write(fname, ext='pickle')

Serialize self to a file.

Parameters
• fname (Union[str, Path]) – Filename where to save the object.

• ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Returns

Nothing, just writes itself to a file using pickle.

Return type

None

property x

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property x_all

Unfiltered independent variables of shape (n_cells, 1).

Return type

ndarray

property x_hat

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type

ndarray

property x_test

Independent variables of shape (n_samples, 1) used for prediction.

Return type

ndarray

property y

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property y_all

Unfiltered dependent variables of shape (n_cells, 1).

Return type

ndarray

property y_hat

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

ndarray

property y_test

Prediction values of shape (n_samples,) for x_test.

Return type

ndarray

copy()[source]

Return a copy of self.

Return type

BaseModel

### SKLearnModel¶

Wrapper around sklearn.base.BaseEstimator.

Parameters
• adata (anndata.AnnData) – Annotated data object.

• model (BaseEstimator) – Instance of the underlying sklearn estimator, such as sklearn.svm.SVR.

• weight_name (Optional[str]) – Name of the weight argument for model .fit. If None, to determine it automatically. If and empty string, no weights will be used.

• ignore_raise (bool) – Do not raise an exception if weight argument is not found in the fitting function of model. This is useful in case when weight is passed in **kwargs and cannot be determined from signature.

fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters
• x (Optional[ndarray]) – Independent variables, array of shape (n_samples, 1). If None, use x.

• y (Optional[ndarray]) – Dependent variables, array of shape (n_samples, 1). If None, use y.

• w (Optional[ndarray]) – Optional weights of x, array of shape (n_samples,). If None, use w.

• kwargs – Keyword arguments for underlying model’s fitting function.

Returns

Fits the model and returns self.

Return type

cellrank.ul.models.SKLearnModel

Run the prediction.

Parameters
• x_test (Optional[ndarray]) – Array of shape (n_samples,) used for prediction. If None, use x_test.

• key_added (str) – Attribute name where to save the x_test for later use. If None, don’t save it.

• kwargs – Keyword arguments for underlying model’s prediction method.

Returns

• y_test - Prediction values of shape (n_samples,) for x_test.

Return type

numpy.ndarray

confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval.

Use default_confidence_interval() function if underlying model has not method for confidence interval calculation.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

numpy.ndarray

property model

The underlying sklearn.base.BaseEstimator.

Return type

BaseEstimator

copy()[source]

Return a copy of self.

Return type

SKLearnModel

Annotated data object.

Returns

Return type

anndata.AnnData

property conf_int

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

ndarray

default_confidence_interval(x_test=None, **kwargs)

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from [DeSalvo70], eq. 5.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

• x_hat - Filtered independent variables used when calculating default confidence interval, usually same as x.

• y_hat - Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

numpy.ndarray

References

DeSalvo70

DeSalvo, J. S. (1970), Standard Error of Forecast in Multiple Regression: Proof of a Useful Result., RAND Corporation.

plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)

Plot the smoothed gene expression.

Parameters
• figsize (Tuple[float, float]) – Size of the figure.

• same_plot (bool) – Whether to plot all trends in the same plot.

• hide_cells (bool) – Whether to hide the cells.

• perc (Optional[Tuple[float, float]]) – Percentile by which to clip the absorption probabilities.

• abs_prob_cmap (ListedColormap) – Colormap to use when coloring in the absorption probabilities.

• cell_color (Optional[str]) – Key in anndata.AnnData.obs or anndata.AnnData.var_names used for coloring the cells.

• lineage_color (str) – Color for the lineage.

• alpha (float) – Alpha channel for cells.

• lineage_alpha (float) – Alpha channel for lineage confidence intervals.

• title (Optional[str]) – Title of the plot.

• size (int) – Size of the points.

• lw (float) – Line width for the smoothed values.

• cbar (bool) – Whether to show colorbar.

• margins (float) – Margins around the plot.

• xlabel (str) – Label on the x-axis.

• ylabel (str) – Label on the y-axis.

• conf_int (bool) – Whether to show the confidence interval.

• lineage_probability (bool) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.

• lineage_probability_conf_int (Union[bool, float]) – Whether to compute and show smoothed lineage probability confidence interval. If self is cellrank.ul.models.GAMR, it can also specify the confidence level, the default is 0.95. Only used when show_lineage_probability=True.

• lineage_probability_color (Optional[str]) – Color to use when plotting the smoothed lineage_probability. If None, it’s the same as lineage_color. Only used when show_lineage_probability=True.

• obs_legend_loc (Optional[str]) – Location of the legend when cell_color corresponds to a categorical variable.

• dpi (Optional[int]) – Dots per inch.

• fig (Optional[Figure]) – Figure to use, if None, create a new one.

• ax (matplotlib.axes.Axes) – Ax to use, if None, create a new one.

• return_fig (bool) – If True, return the figure object.

• save (Optional[str]) – Filename where to save the plot. If None, just shows the plots.

• kwargs – Keyword arguments for matplotlib.axes.Axes.legend(), e.g. to disable the legend, specify loc=None. Only available when show_lineage_probability=True.

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)

Prepare the model to be ready for fitting.

Parameters
• gene (str) – Gene in adata .var_names or in adata .raw.var_names.

• lineage (Optional[str]) – Name of a lineage in adata .obsm[lineage_key]. If None, all weights will be set to 1.

• backward (bool) – Direction of the process.

• time_range (Union[float, Tuple[float, float], None]) –

Specify start and end times:

• If a tuple, it specifies the minimum and maximum pseudotime. Both values can be None, in which case the minimum is the earliest pseudotime and the maximum is automatically determined.

• If a float, it specifies the maximum pseudotime.

• data_key (str) – Key in adata .layers or ‘X’ for adata .X. If use_raw=True, it’s always set to ‘X’.

• time_key (str) – Key in adata .obs where the pseudotime is stored.

• use_raw (bool) – Whether to access adata .raw or not.

• threshold (Optional[float]) – Consider only cells with weights > threshold when estimating the test endpoint. If None, use the median of the weights.

• weight_threshold (Union[float, Tuple[float, float]]) – Set all weights below weight_threshold to weight_threshold if a float, or to the second value, if a tuple.

• filter_cells (Optional[float]) – Filter out all cells with expression values lower than this threshold.

• n_test_points (int) – Number of test points. If None, use the original points based on threshold.

Returns

Nothing, but updates the following fields:

• x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

• y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

• w - Filtered weights of shape (n_filtered_cells,) used for fitting.

• x_all - Unfiltered independent variables of shape (n_cells, 1).

• y_all - Unfiltered dependent variables of shape (n_cells, 1).

• w_all - Unfiltered weights of shape (n_cells,).

• x_test - Independent variables of shape (n_samples, 1) used for prediction.

• prepared - Whether the model is prepared for fitting.

Return type

None

property prepared

Whether the model is prepared for fitting.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property w

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

ndarray

property w_all

Unfiltered weights of shape (n_cells,).

Return type

ndarray

write(fname, ext='pickle')

Serialize self to a file.

Parameters
• fname (Union[str, Path]) – Filename where to save the object.

• ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Returns

Nothing, just writes itself to a file using pickle.

Return type

None

property x

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property x_all

Unfiltered independent variables of shape (n_cells, 1).

Return type

ndarray

property x_hat

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type

ndarray

property x_test

Independent variables of shape (n_samples, 1) used for prediction.

Return type

ndarray

property y

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property y_all

Unfiltered dependent variables of shape (n_cells, 1).

Return type

ndarray

property y_hat

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

ndarray

property y_test

Prediction values of shape (n_samples,) for x_test.

Return type

ndarray

### GAMR¶

class cellrank.ul.models.GAMR(adata, n_knots=5, distribution='gaussian', basis='cr', knotlocs='auto', offset='default', smoothing_penalty=1.0, **kwargs)[source]

Wrapper around R’s mgcv package for fitting Generalized Additive Models (GAMs).

Parameters
• adata (anndata.AnnData) – Annotated data object.

• n_knots (int) – Number of knots.

• distribution (str) – Distribution family in rpy2.robjects.r, such as ‘gaussian’ or ‘nb’ for negative binomial. If ‘nb’, raw count data in adata .raw is always used.

• basis (str) – Basis for the smoothing term. See here for valid options.

• knotlocs (str) –

Position of the knots. Can be one of the following:

• ’auto’ - let mgcv handle the knot positions.

• ’density’ - position the knots based on the density of the pseudotime.

• offset (Union[str, ndarray, None]) – Offset term for the GAM. Only available when distribution='nb'. If ‘default’, it is calculated according to [Robinson10]. The values are saved in adata .obs['cellrank_offset']. If None, no offset is used.

• smoothing_penalty (float) – Penalty for the smoothing term. The larger the value, the smoother the fitted curve.

• kwargs – Keyword arguments for gam.control. See here for reference.

prepare(*args, **kwargs)[source]

Prepare the model to be ready for fitting. This also removes the zero and negative weights and prepares the design matrix.

Parameters

• lineage – Name of a lineage in adata .obsm[lineage_key]. If None, all weights will be set to 1.

• backward – Direction of the process.

• time_range

Specify start and end times:

• If a tuple, it specifies the minimum and maximum pseudotime. Both values can be None, in which case the minimum is the earliest pseudotime and the maximum is automatically determined.

• If a float, it specifies the maximum pseudotime.

• data_key – Key in adata .layers or ‘X’ for adata .X. If use_raw=True, it’s always set to ‘X’.

• time_key – Key in adata .obs where the pseudotime is stored.

• use_raw – Whether to access adata .raw or not.

• threshold – Consider only cells with weights > threshold when estimating the test endpoint. If None, use the median of the weights.

• weight_threshold – Set all weights below weight_threshold to weight_threshold if a float, or to the second value, if a tuple.

• filter_cells – Filter out all cells with expression values lower than this threshold.

• n_test_points – Number of test points. If None, use the original points based on threshold.

Returns

Nothing, but updates the following fields:

• x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

• y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

• w - Filtered weights of shape (n_filtered_cells,) used for fitting.

• x_all - Unfiltered independent variables of shape (n_cells, 1).

• y_all - Unfiltered dependent variables of shape (n_cells, 1).

• w_all - Unfiltered weights of shape (n_cells,).

• x_test - Independent variables of shape (n_samples, 1) used for prediction.

• prepared - Whether the model is prepared for fitting.

Return type

None

fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters
• x (Optional[ndarray]) – Independent variables, array of shape (n_samples, 1). If None, use x.

• y (Optional[ndarray]) – Dependent variables, array of shape (n_samples, 1). If None, use y.

• w (Optional[ndarray]) – Optional weights of x, array of shape (n_samples,). If None, use w.

• kwargs – Keyword arguments for underlying model’s fitting function.

Returns

Fits the model and returns self. Updates the following fields by filtering out 0 weights w:

• x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

• y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

• w - Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

cellrank.ul.models.GAMR

Run the prediction. This method can also compute the confidence interval.

Parameters
• x_test (Optional[ndarray]) – Array of shape (n_samples,) used for prediction. If None, use x_test.

• key_added (str) – Attribute name where to save the x_test for later use. If None, don’t save it.

• kwargs – Keyword arguments for underlying model’s prediction method.

• level (Optional[float]) – Confidence level for confidence interval calculation. If None, don’t compute the confidence interval. Must be in the interval [0, 1].

Returns

• y_test - Prediction values of shape (n_samples,) for x_test.

Return type

numpy.ndarray

confidence_interval(x_test=None, level=0.95, **kwargs)[source]

Calculate the confidence interval. Internally, this method calls cellrank.ul.models.GAMR.predict() to extract the confidence interval, if needed.

Parameters
• x_test (Optional[ndarray]) – Array of shape (n_samples,) used for confidence interval calculation. If None, use x_test.

• kwargs – Keyword arguments for underlying model’s confidence method or for default_confidence_interval().

• level (float) – Confidence level.

Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

numpy.ndarray

copy()[source]

Return a copy of self.

Return type

GAMR

Annotated data object.

Returns

Return type

anndata.AnnData

property conf_int

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

ndarray

default_confidence_interval(x_test=None, **kwargs)

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from [DeSalvo70], eq. 5.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

• x_hat - Filtered independent variables used when calculating default confidence interval, usually same as x.

• y_hat - Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

numpy.ndarray

References

DeSalvo70

DeSalvo, J. S. (1970), Standard Error of Forecast in Multiple Regression: Proof of a Useful Result., RAND Corporation.

property model

The underlying model.

Return type

Any

plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)

Plot the smoothed gene expression.

Parameters
• figsize (Tuple[float, float]) – Size of the figure.

• same_plot (bool) – Whether to plot all trends in the same plot.

• hide_cells (bool) – Whether to hide the cells.

• perc (Optional[Tuple[float, float]]) – Percentile by which to clip the absorption probabilities.

• abs_prob_cmap (ListedColormap) – Colormap to use when coloring in the absorption probabilities.

• cell_color (Optional[str]) – Key in anndata.AnnData.obs or anndata.AnnData.var_names used for coloring the cells.

• lineage_color (str) – Color for the lineage.

• alpha (float) – Alpha channel for cells.

• lineage_alpha (float) – Alpha channel for lineage confidence intervals.

• title (Optional[str]) – Title of the plot.

• size (int) – Size of the points.

• lw (float) – Line width for the smoothed values.

• cbar (bool) – Whether to show colorbar.

• margins (float) – Margins around the plot.

• xlabel (str) – Label on the x-axis.

• ylabel (str) – Label on the y-axis.

• conf_int (bool) – Whether to show the confidence interval.

• lineage_probability (bool) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.

• lineage_probability_conf_int (Union[bool, float]) – Whether to compute and show smoothed lineage probability confidence interval. If self is cellrank.ul.models.GAMR, it can also specify the confidence level, the default is 0.95. Only used when show_lineage_probability=True.

• lineage_probability_color (Optional[str]) – Color to use when plotting the smoothed lineage_probability. If None, it’s the same as lineage_color. Only used when show_lineage_probability=True.

• obs_legend_loc (Optional[str]) – Location of the legend when cell_color corresponds to a categorical variable.

• dpi (Optional[int]) – Dots per inch.

• fig (Optional[Figure]) – Figure to use, if None, create a new one.

• ax (matplotlib.axes.Axes) – Ax to use, if None, create a new one.

• return_fig (bool) – If True, return the figure object.

• save (Optional[str]) – Filename where to save the plot. If None, just shows the plots.

• kwargs – Keyword arguments for matplotlib.axes.Axes.legend(), e.g. to disable the legend, specify loc=None. Only available when show_lineage_probability=True.

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

property prepared

Whether the model is prepared for fitting.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property w

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

ndarray

property w_all

Unfiltered weights of shape (n_cells,).

Return type

ndarray

write(fname, ext='pickle')

Serialize self to a file.

Parameters
• fname (Union[str, Path]) – Filename where to save the object.

• ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Returns

Nothing, just writes itself to a file using pickle.

Return type

None

property x

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property x_all

Unfiltered independent variables of shape (n_cells, 1).

Return type

ndarray

property x_hat

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type

ndarray

property x_test

Independent variables of shape (n_samples, 1) used for prediction.

Return type

ndarray

property y

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property y_all

Unfiltered dependent variables of shape (n_cells, 1).

Return type

ndarray

property y_hat

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

ndarray

property y_test

Prediction values of shape (n_samples,) for x_test.

Return type

ndarray

## Base Classes¶

### BaseEstimator¶

Base class for all estimators.

Parameters
• obj (Union[KernelExpression, ~AnnData, spmatrix, ndarray]) – Either a cellrank.tl.kernels.Kernel object, an anndata.AnnData object which stores the transition matrix in .obsp attribute or numpy or scipy array.

• inplace (bool) – Whether to modify adata object inplace or make a copy.

• obsp_key (Optional[str]) – Key in obj.obsp when obj is an anndata.AnnData object.

• g2m_key (Optional[str]) – Key in adata .obs. Can be used to detect cell-cycle driven start- or endpoints.

• s_key (Optional[str]) – Key in adata .obs. Can be used to detect cell-cycle driven start- or endpoints.

• write_to_adata (bool) – Whether to write the transition matrix to adata .obsp and the parameters to adata .uns.

• key (Optional[str]) – Key used when writing transition matrix to adata. If None, the key is set to ‘T_bwd’ if backward is True, else ‘T_fwd’. Only used when write_to_adata=True.

set_terminal_states(labels, cluster_key=None, en_cutoff=None, p_thresh=None, add_to_existing=False, **kwargs)[source]

Set the approximate recurrent classes, if they are known a priori.

Parameters
• labels (Union[Series, Dict[str, Any]]) – Either a categorical pandas.Series with index as cell names, where NaN marks marks a cell belonging to a transient state or a dict, where each key is the name of the recurrent class and values are list of cell names.

• cluster_key (Optional[str]) – If a key to cluster labels is given, terminal_states will ge associated with these for naming and colors.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labelled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (Optional[float]) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

• add_to_existing (bool) – Whether to add thses categories to existing ones. Cells already belonging to recurrent classes will be updated if there’s an overlap. Throws an error if previous approximate recurrent classes have not been calculated.

Returns

Nothing, but updates the following fields:

• terminal_states_probabilities

• terminal_states

Return type

None

Rename the names of terminal_states.

Parameters
• new_names (Mapping[str, str]) – Mapping where keys are the old names and the values are the new names. New names must be unique.

• update_adata (bool) – Whether to update underlying adata object as well or not.

Returns

Nothing, just updates the names of terminal_states.

Return type

None

compute_absorption_probabilities(keys=None, check_irred=False, solver=None, use_petsc=None, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-05, preconditioner=None)[source]

Compute absorption probabilities of a Markov chain.

For each cell, this computes the probability of it reaching any of the approximate recurrent classes defined by terminal_states. This also computes the entropy over absorption probabilities, which is a measure of cell plasticity, see [Setty19].

Parameters
• keys (Optional[Sequence[str]]) – Keys defining the recurrent classes.

• check_irred (bool) – Check whether the transition matrix is irreducible.

• solver (Optional[str]) –

Solver to use for the linear problem. Options are ‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’ or ‘gcrotmk’ when use_petsc=False or one of petsc4py.PETSc.KPS.Type otherwise.

Information on the scipy iterative solvers can be found in scipy.sparse.linalg() or for petsc4py solver here.

If is None, the solver is chosen automatically, depending on the problem size.

• use_petsc (Optional[bool]) – Whether to use solvers from petsc4py or scipy. Recommended for large problems. If None, it is determined automatically. If no installation is found, defaults to scipy.sparse.linalg.gmres().

• time_to_absorption (Union[str, Sequence[Union[str, Sequence[str]]], Dict[Union[str, Sequence[str]], str], None]) –

Whether to compute mean time to absorption and its variance to specific absorbing states.

If a dict, can be specified as {'Alpha': 'var', ...} to also compute variance. In case when states are a tuple, time to absorption will be computed to the subset of these states, such as [('Alpha', 'Beta'), ...] or {('Alpha', 'Beta'): 'mean', ...}. Can be specified as 'all' to compute it to any absorbing state in keys, which is more efficient than listing all absorbing states.

It might be beneficial to disable the progress bar as show_progress_bar=False, because many linear systems are being solved.

• n_jobs (Optional[int]) – Number of parallel jobs to use when using an iterative solver. When use_petsc=True or for quickly-solvable problems, we recommend higher number (>=8) of jobs in order to fully saturate the cores.

• backend (str) – Which backend to use for multiprocessing. See joblib.Parallel for valid options.

• show_progress_bar (bool) – Whether to show progress bar when the solver isn’t a direct one.

• tol (float) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.

• preconditioner (Optional[str]) – Preconditioner to use, only available when use_petsc=True. For available values, see here or the values of petsc4py.PETSc.PC.Type. We recommended ‘ilu’ preconditioner for badly conditioned problems.

Returns

Nothing, but updates the following fields:

• absorption_probabilities - probabilities of being absorbed into the terminal states.

• diff_potential - differentiation potential of cells.

• lineage_absorption_times - mean times until absorption to subset absorbing states and optionally their variances saved as '{lineage} mean' and '{lineage} var', respectively, for each subset of absorbing states specified in time_to_absorption.

Return type

None

compute_lineage_drivers(lineages=None, method='fischer', cluster_key=None, clusters=None, layer='X', use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, return_drivers=True, **kwargs)[source]

Compute driver genes per lineage.

Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.

Parameters
• lineages (Union[str, Sequence, None]) – Either a set of lineage names from absorption_probabilities .names or None, in which case all lineages are considered.

• method (str) –

Mode to use when calculating p-values and confidence intervals. Can be one of:

• ’fischer’ - use Fischer transformation [Fischer21].

• ’perm_test’ - use permutation test.

• cluster_key (Optional[str]) – Key from adata .obs to obtain cluster annotations. These are considered for clusters.

• clusters (Union[str, Sequence, None]) – Restrict the correlations to these clusters.

• layer (str) – Key from adata .layers.

• use_raw (bool) – Whether or not to use adata .raw to correlate gene expression. If using a layer other than .X, this must be set to False.

• confidence_level (float) – Confidence level for the confidence interval calculation. Must be in [0, 1].

• n_perms (int) – Number of permutations to use when method='perm_test'.

• seed (Optional[int]) – Random seed when method='perm_test'.

• return_drivers (bool) – Whether to return the drivers. This also contains the lower and upper confidence_level confidence interval bounds.

• show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.

• n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

• backend – Which backend to use for parallelization. See joblib.Parallel for valid options.

Return type

Optional[DataFrame]

Returns

• Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, 1 for each lineage:

• {lineage} corr - correlation between the gene expression and absorption probabilities.

• {lineage} pval - calulated p-values for double-sided test.

• {lineage} qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

• {lineage} ci low - lower bound of the confidence_level correlation confidence interval.

• {lineage} ci high - upper bound of the confidence_level correlation confidence interval.

Only if return_drivers=True.

• None

• '{direction} {lineage} corr' - the potential lineage drivers.

• '{direction} {lineage} qval' - the corrected p-values.

• lineage_drivers - same as the returned values.

References

Fischer21

Fisher, R. A. (1921), On the “probable error” of a coefficient of correlation deduced from a small sample., Metron 1 3–32.

plot_lineage_drivers(lineage, n_genes=8, use_raw=False, **kwargs)[source]

Plot lineage drivers discovered by compute_lineage_drivers().

Parameters
• lineage (str) – Lineage for which to plot the driver genes.

• n_genes (int) – Number of genes to plot.

• use_raw (bool) – Whether to look in adata .raw.var or adata .var.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

fit(keys=None, compute_absorption_probabilities=True, **kwargs)[source]

Run the pipeline.

Parameters
• keys (Optional[Sequence]) – States for which to compute absorption probabilities.

• compute_absorption_probabilities (bool) – Whether to compute absorption probabilities or just initial or terminal states.

• kwargs – Keyword arguments.

Returns

Nothing, just makes available the following fields:

• terminal_states_probabilities

• terminal_states

• absorption_probabilities

• diff_potential

Return type

None

copy()[source]

Return a copy of self, including the underlying adata object.

Return type

BaseEstimator

write(fname, ext='pickle')[source]

Serialize self to a file.

Parameters
• fname (Union[str, Path]) – Filename where to save the object.

• ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Returns

Nothing, just writes itself to a file using pickle.

Return type

None

### Kernel¶

class cellrank.tl.kernels.Kernel(adata, backward=False, compute_cond_num=False, check_connectivity=False, **kwargs)[source]

A base class from which all kernels are derived.

These kernels read from a given AnnData object, usually the KNN graph and additional variables, to compute a weighted, directed graph. Every kernel object has a direction. The kernels defined in the derived classes are not strictly kernels in the mathematical sense because they often only take one input argument - however, they build on other functions which have computed a similarity based on two input arguments. The role of the kernels defined here is to add directionality to these symmetric similarity relations or to transform them.

Parameters
• adata (anndata.AnnData) – Annotated data object.

• backward (bool) – Direction of the process.

• compute_cond_num (bool) – Whether to compute the condition number of the transition matrix. For large matrices, this can be very slow.

• check_connectivity (bool) – Check whether the underlying KNN graph is connected.

• kwargs – Keyword arguments which can specify key to be read from adata object.

Annotated data object.

Returns

Annotated data object.

Return type

anndata.AnnData

property backward

Direction of the process.

Return type

bool

abstract compute_transition_matrix(*args, **kwargs)

Compute a transition matrix.

Parameters
• *args – Positional arguments.

• kwargs – Keyword arguments.

Returns

Self.

Return type

cellrank.tl.kernels.KernelExpression

property condition_number

Condition number of the transition matrix.

abstract copy()

Return a copy of itself. Note that the underlying adata object is not copied.

Return type

KernelExpression

property kernels

Get the kernels of the kernel expression, except for constants.

Return type

List[Kernel]

property params

Parameters which are used to compute the transition matrix.

Return type

Dict[str, Any]

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property transition_matrix

Return row-normalized transition matrix.

If not present, it is computed, if all the underlying kernels have been initialized.

Return type

Union[ndarray, spmatrix]

write(fname, ext='pickle')

Serialize self to a file.

Parameters
• fname (Union[str, Path]) – Filename where to save the object.

• ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Returns

Nothing, just writes itself to a file using pickle.

Return type

None

Write the transition matrix and parameters used for computation to the underlying adata object.

Parameters

key (Optional[str]) – Key used when writing transition matrix to adata. If None, the key is set to ‘T_bwd’ if backward is True, else ‘T_fwd’.

Returns

• .obsp['{key}'] - the transition matrix.

• .uns['{key}_params'] - parameters used for calculation.

Return type

None

### Similarity scheme¶

class cellrank.tl.kernels.SimilaritySchemeABC[source]

Base class for all similarity schemes.

abstract __call__(v, D, softmax_scale=1.0)[source]

Compute transition probability of a cell to its nearest neighbors using RNA velocity.

Parameters
• v (ndarray) – Array of shape (n_genes,) or (n_neighbors, n_genes) containing the velocity vector(s). The second case is used for the backward process.

• D (ndarray) – Array of shape (n_neighbors, n_genes) corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.

• softmax_scale (float) – Scaling factor for the softmax function.

Returns

The probability and logits arrays of shape (n_neighbors,).

Return type

### BaseModel¶

Base class for all model classes.

Parameters
• adata (anndata.AnnData) – Annotated data object.

• model (Any) – The underlying model that is used for fitting and prediction.

property prepared

Whether the model is prepared for fitting.

Annotated data object.

Returns

Return type

anndata.AnnData

property model

The underlying model.

Return type

Any

property x_all

Unfiltered independent variables of shape (n_cells, 1).

Return type

ndarray

property y_all

Unfiltered dependent variables of shape (n_cells, 1).

Return type

ndarray

property w_all

Unfiltered weights of shape (n_cells,).

Return type

ndarray

property x

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property y

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property w

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

ndarray

property x_test

Independent variables of shape (n_samples, 1) used for prediction.

Return type

ndarray

property y_test

Prediction values of shape (n_samples,) for x_test.

Return type

ndarray

property x_hat

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type

ndarray

property y_hat

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

ndarray

property conf_int

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

ndarray

prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)[source]

Prepare the model to be ready for fitting.

Parameters
• gene (str) – Gene in adata .var_names or in adata .raw.var_names.

• lineage (Optional[str]) – Name of a lineage in adata .obsm[lineage_key]. If None, all weights will be set to 1.

• backward (bool) – Direction of the process.

• time_range (Union[float, Tuple[float, float], None]) –

Specify start and end times:

• If a tuple, it specifies the minimum and maximum pseudotime. Both values can be None, in which case the minimum is the earliest pseudotime and the maximum is automatically determined.

• If a float, it specifies the maximum pseudotime.

• data_key (str) – Key in adata .layers or ‘X’ for adata .X. If use_raw=True, it’s always set to ‘X’.

• time_key (str) – Key in adata .obs where the pseudotime is stored.

• use_raw (bool) – Whether to access adata .raw or not.

• threshold (Optional[float]) – Consider only cells with weights > threshold when estimating the test endpoint. If None, use the median of the weights.

• weight_threshold (Union[float, Tuple[float, float]]) – Set all weights below weight_threshold to weight_threshold if a float, or to the second value, if a tuple.

• filter_cells (Optional[float]) – Filter out all cells with expression values lower than this threshold.

• n_test_points (int) – Number of test points. If None, use the original points based on threshold.

Returns

Nothing, but updates the following fields:

• x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

• y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

• w - Filtered weights of shape (n_filtered_cells,) used for fitting.

• x_all - Unfiltered independent variables of shape (n_cells, 1).

• y_all - Unfiltered dependent variables of shape (n_cells, 1).

• w_all - Unfiltered weights of shape (n_cells,).

• x_test - Independent variables of shape (n_samples, 1) used for prediction.

• prepared - Whether the model is prepared for fitting.

Return type

None

abstract fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters
• x (Optional[ndarray]) – Independent variables, array of shape (n_samples, 1). If None, use x.

• y (Optional[ndarray]) – Dependent variables, array of shape (n_samples, 1). If None, use y.

• w (Optional[ndarray]) – Optional weights of x, array of shape (n_samples,). If None, use w.

• kwargs – Keyword arguments for underlying model’s fitting function.

Returns

Fits the model and returns self.

Return type

cellrank.ul.models.BaseModel

Run the prediction.

Parameters
• x_test (Optional[ndarray]) – Array of shape (n_samples,) used for prediction. If None, use x_test.

• key_added (Optional[str]) – Attribute name where to save the x_test for later use. If None, don’t save it.

• kwargs – Keyword arguments for underlying model’s prediction method.

Returns

• y_test - Prediction values of shape (n_samples,) for x_test.

Return type

numpy.ndarray

abstract confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval.

Use default_confidence_interval() function if underlying model has not method for confidence interval calculation.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

numpy.ndarray

default_confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from [DeSalvo70], eq. 5.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

• x_hat - Filtered independent variables used when calculating default confidence interval, usually same as x.

• y_hat - Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

numpy.ndarray

References

DeSalvo70

DeSalvo, J. S. (1970), Standard Error of Forecast in Multiple Regression: Proof of a Useful Result., RAND Corporation.

plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)[source]

Plot the smoothed gene expression.

Parameters
• figsize (Tuple[float, float]) – Size of the figure.

• same_plot (bool) – Whether to plot all trends in the same plot.

• hide_cells (bool) – Whether to hide the cells.

• perc (Optional[Tuple[float, float]]) – Percentile by which to clip the absorption probabilities.

• abs_prob_cmap (ListedColormap) – Colormap to use when coloring in the absorption probabilities.

• cell_color (Optional[str]) – Key in anndata.AnnData.obs or anndata.AnnData.var_names used for coloring the cells.

• lineage_color (str) – Color for the lineage.

• alpha (float) – Alpha channel for cells.

• lineage_alpha (float) – Alpha channel for lineage confidence intervals.

• title (Optional[str]) – Title of the plot.

• size (int) – Size of the points.

• lw (float) – Line width for the smoothed values.

• cbar (bool) – Whether to show colorbar.

• margins (float) – Margins around the plot.

• xlabel (str) – Label on the x-axis.

• ylabel (str) – Label on the y-axis.

• conf_int (bool) – Whether to show the confidence interval.

• lineage_probability (bool) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.

• lineage_probability_conf_int (Union[bool, float]) – Whether to compute and show smoothed lineage probability confidence interval. If self is cellrank.ul.models.GAMR, it can also specify the confidence level, the default is 0.95. Only used when show_lineage_probability=True.

• lineage_probability_color (Optional[str]) – Color to use when plotting the smoothed lineage_probability. If None, it’s the same as lineage_color. Only used when show_lineage_probability=True.

• obs_legend_loc (Optional[str]) – Location of the legend when cell_color corresponds to a categorical variable.

• dpi (Optional[int]) – Dots per inch.

• fig (Optional[Figure]) – Figure to use, if None, create a new one.

• ax (matplotlib.axes.Axes) – Ax to use, if None, create a new one.

• return_fig (bool) – If True, return the figure object.

• save (Optional[str]) – Filename where to save the plot. If None, just shows the plots.

• kwargs – Keyword arguments for matplotlib.axes.Axes.legend(), e.g. to disable the legend, specify loc=None. Only available when show_lineage_probability=True.

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

abstract copy()[source]

Return a copy of self.

Return type

BaseModel

### Lineage¶

class cellrank.tl.Lineage(input_array: numpy.ndarray, *, names: Iterable[str], colors: Optional[Iterable[ColorLike]] = None)[source]

Lightweight numpy.ndarray wrapper that adds names and colors.

Parameters
• input_array – Input array containing lineage probabilities, each lineage being stored in a column.

• names – Names of the lineages.

• colors – Colors of the lineages.

property names

Lineage names. Must be unique.

Return type

ndarray

property colors

Lineage colors.

Return type

ndarray

property X

Convert self to numpy array, losing names and colors.

Return type

ndarray

property T

Transpose of self.

view(dtype=None, type=None)[source]

Return a view of self.

Return type

LineageView

plot_pie(reduction, title=None, legend_loc='on data', legend_kwargs=mappingproxy({}), figsize=None, dpi=None, save=None, **kwargs)[source]

Plot a pie chart visualizing aggregated lineage probabilities.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

reduce(*keys, mode='dist', dist_measure='mutual_info', normalize_weights='softmax', softmax_scale=1, return_weights=False)[source]

Subset states and normalize them so that they again sum to 1.

Parameters
• keys (str) – List of keys that define the states, to which this object will be reduced by projecting the values of the other states.

• mode (str) – Whether to use a distance measure to compute weights - ‘dist’, or just rescale - ‘scale’.

• dist_measure (str) –

Used to quantify similarity between query and reference states. Valid options are:

• ’cosine_sim’ - cosine similarity.

• ’wasserstein_dist’ - Wasserstein distance.

• ’kl_div’ - Kullback–Leibler divergence.

• ’js_div’ - Jensen–Shannon divergence.

• ’mutual_inf’ - mutual information.

• ’equal’ - equally redistribute the mass among the rest.

• normalize_weights (str) –

How to row-normalize the weights. Valid options are:

• ’scale’ - divide by the sum.

• ’softmax’- use a softmax.

• softmax_scale (float) – Scaling factor in the softmax, used for normalizing the weights to sum to 1.

• return_weights (bool) – If True, a pandas.DataFrame of the weights used for the projection is returned.

Returns

Lineage object, reduced to the initial or terminal states. If a reduction is not possible, returns just a copy of self.The weights used for the projection of shape (n_query, n_reference), if return_weights=True.

Return type
entropy(qk=None, base=None, axis=0)

Calculate the entropy of a distribution for given probability values.

If only probabilities pk are given, the entropy is calculated as S = -sum(pk * log(pk), axis=axis).

If qk is not None, then compute the Kullback-Leibler divergence S = sum(pk * log(pk / qk), axis=axis).

This routine will normalize pk and qk if they don’t sum to 1.

Parameters
• pk (sequence) – Defines the (discrete) distribution. pk[i] is the (possibly unnormalized) probability of event i.

• qk (sequence, optional) – Sequence against which the relative entropy is computed. Should be in the same format as pk.

• base (float, optional) – The logarithmic base to use, defaults to e (natural logarithm).

• axis (int, optional) – The axis along which the entropy is calculated. Default is 0.

Returns

S – The calculated entropy.

Return type

float

Examples

>>> from scipy.stats import entropy

Bernoulli trial with different p. The outcome of a fair coin is the most uncertain:

>>> entropy([1/2, 1/2], base=2)
1.0

The outcome of a biased coin is less uncertain:

>>> entropy([9/10, 1/10], base=2)
0.46899559358928117

Relative entropy:

>>> entropy([1/2, 1/2], qk=[9/10, 1/10])
0.5108256237659907