# Classes¶

## Estimators¶

### GPCCA¶

Generalized Perron Cluster Cluster Analysis as implemented in pyGPCCA.

Coarse-grains a discrete Markov chain into a set of macrostates and computes coarse-grained transition probabilities among the macrostates. Each macrostate corresponds to an area of the state space, i.e. to a subset of cells. The assignment is soft, i.e. each cell is assigned to every macrostate with a certain weight, where weights sum to one per cell. Macrostates are computed by maximizing the ‘crispness’ which can be thought of as a measure for minimal overlap between macrostates in a certain inner-product sense. Once the macrostates have been computed, we project the large transition matrix onto a coarse-grained transition matrix among the macrostates via a Galerkin projection. This projection is based on invariant subspaces of the original transition matrix which are obtained using the real Schur decomposition .

Parameters
compute_macrostates(n_states=None, n_cells=30, use_min_chi=False, cluster_key=None, en_cutoff=0.7, p_thresh=1e-15)[source]

Compute the macrostates.

Parameters
Returns

Nothing, but updates the following fields:

Return type

None

set_terminal_states_from_macrostates(names=None, n_cells=30)[source]

Manually select terminal states from macrostates.

Parameters
Returns

Nothing, just updates the following fields:

Return type

None

compute_terminal_states(method='stability', n_cells=30, alpha=1, stability_threshold=0.96, n_states=None)[source]

Automatically select terminal states from macrostates.

Parameters
• method (str) –

One of following:

• ’eigengap’ - select the number of states based on the eigengap of the transition matrix.

• ’eigengap_coarse’ - select the number of states based on the eigengap of the diagonal of the coarse-grained transition matrix.

• ’top_n’ - select top n_states based on the probability of the diagonal of the coarse-grained transition matrix.

• ’stability’ - select states which have a stability index >= stability_threshold. The stability index is given by the diagonal elements of the coarse-grained transition matrix.

• n_cells (int) – Number of most likely cells from each macrostate to select.

• alpha (Optional[float]) – Weight given to the deviation of an eigenvalue from one. Used when method='eigengap' or method='eigengap_coarse'.

• stability_threshold (float) – Threshold used when method='stability'.

• n_states (Optional[int]) – Numer of states used when method='top_n'.

Returns

Nothing, just updates the following fields:

Return type

None

Compute generalized Diffusion pseudotime from using the real Schur decomposition.

Parameters
• n_components (int) – Number of real Schur vectors to consider.

• key_added (str) – Key in adata .obs where to save the pseudotime.

• kwargs – Keyword arguments for cellrank.tl.GPCCA.compute_schur() if Schur decomposition is not found.

Returns

Nothing, just updates adata .obs[key_added] with the computed pseudotime.

Return type

None

plot_coarse_T(show_stationary_dist=True, show_initial_dist=False, cmap='viridis', xtick_rotation=45, annotate=True, show_cbar=True, title=None, figsize=(8, 8), dpi=80, save=None, text_kwargs=mappingproxy({}), **kwargs)[source]

Plot the coarse-grained transition matrix between macrostates.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_macrostate_composition(key, width=0.8, title=None, labelrot=45, legend_loc='upper right out', figsize=None, dpi=None, save=None, show=True)[source]

Plot stacked histogram of macrostates over categorical annotations.

Parameters
Return type

Optional[Axes]

Returns

• matplotlib.pyplot.Axes – The axis object if show=False.

• None – Nothing, just plots the figure. Optionally saves it based on save.

fit(n_lineages=None, cluster_key=None, keys=None, method='krylov', compute_absorption_probabilities=True, **kwargs)[source]

Run the pipeline, computing the macrostates, initial or terminal states and optionally the absorption probabilities.

It is equivalent to running:

if n_lineages is None or n_lineages == 1:
compute_eigendecomposition(...)  # get the stationary distribution
if n_lineages > 1:
compute_schur(...)

compute_macrostates(...)

if n_lineages is None:
compute_terminal_states(...)
else:
set_terminal_states_from_macrostates(...)

if compute_absorption_probabilities:
compute_absorption_probabilities(...)

Parameters
Returns

Nothing, just makes available the following fields:

Return type

None

property absorption_probabilities: cellrank.tl._lineage.Lineage

Absorption probabilities.

Return type

Lineage

Annotated data object.

Returns

Annotated data object.

Return type

anndata.AnnData

property coarse_T: pandas.core.frame.DataFrame

Coarse-grained transition matrix.

Return type

DataFrame

property coarse_initial_distribution: pandas.core.series.Series

Coarse initial distribution.

Return type

Series

property coarse_stationary_distribution: pandas.core.series.Series

Coarse stationary distribution.

Return type

Series

compute_absorption_probabilities(keys=None, check_irreducibility=False, solver='gmres', use_petsc=True, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-06, preconditioner=None)

Compute absorption probabilities of a Markov chain.

For each cell, this computes the probability of it reaching any of the approximate recurrent classes defined by terminal_states.

Parameters
• keys (Optional[Sequence[str]]) – Keys defining the recurrent classes.

• check_irreducibility (bool) – Check whether the transition matrix is irreducible.

• solver (str) –

Solver to use for the linear problem. Options are ‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’ or ‘gcrotmk’ when use_petsc=False or one of petsc4py.PETSc.KPS.Type otherwise.

Information on the scipy iterative solvers can be found in scipy.sparse.linalg() or for petsc4py solver here.

• use_petsc (bool) – Whether to use solvers from petsc4py or scipy. Recommended for large problems. If no installation is found, defaults to scipy.sparse.linalg.gmres().

• Whether to compute mean time to absorption and its variance to specific absorbing states.

If a dict, can be specified as {'Alpha': 'var', ...} to also compute variance. In case when states are a tuple, time to absorption will be computed to the subset of these states, such as [('Alpha', 'Beta'), ...] or {('Alpha', 'Beta'): 'mean', ...}. Can be specified as 'all' to compute it to any absorbing state in keys, which is more efficient than listing all absorbing states.

It might be beneficial to disable the progress bar as show_progress_bar=False, because many linear systems are being solved.

• n_jobs (Optional[int]) – Number of parallel jobs to use when using an iterative solver. When use_petsc=True or for quickly-solvable problems, we recommend higher number (>=8) of jobs in order to fully saturate the cores.

• backend (str) – Which backend to use for multiprocessing. See joblib.Parallel for valid options.

• show_progress_bar (bool) – Whether to show progress bar when the solver isn’t a direct one.

• tol (float) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.

• preconditioner (Optional[str]) – Preconditioner to use, only available when use_petsc=True. For available values, see here or the values of petsc4py.PETSc.PC.Type. We recommended ‘ilu’ preconditioner for badly conditioned problems.

Returns

Nothing, but updates the following fields:

• absorption_probabilities - probabilities of being absorbed into the terminal states.

• lineage_absorption_times - mean times until absorption to subset absorbing states and optionally their variances saved as '{lineage} mean' and '{lineage} var', respectively, for each subset of absorbing states specified in time_to_absorption.

Return type

None

compute_eigendecomposition(k=20, which='LR', alpha=1, only_evals=False, ncv=None)

Compute eigendecomposition of transition matrix.

Uses a sparse implementation, if possible, and only computes the top $$k$$ eigenvectors to speed up the computation. Computes both left and right eigenvectors.

Parameters
Returns

Nothing, but updates the following field:

Return type

None

compute_lineage_drivers(lineages=None, method='fischer', cluster_key=None, clusters=None, layer='X', use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, return_drivers=True, **kwargs)

Compute driver genes per lineage.

Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.

Parameters
Return type

Optional[DataFrame]

Returns

• Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, 1 for each lineage –

• {lineage} corr - correlation between the gene expression and absorption probabilities.

• {lineage} pval - calculated p-values for double-sided test.

• {lineage} qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

• {lineage} ci low - lower bound of the confidence_level correlation confidence interval.

• {lineage} ci high - upper bound of the confidence_level correlation confidence interval.

• Only if return_drivers=True.

• Otherwise, updates adata .var or adata .raw.var, depending use_raw with –

• '{direction} {lineage} corr' - the potential lineage drivers.

• '{direction} {lineage} qval' - the corrected p-values.

• Also updates the following fields

compute_lineage_priming(method='kl_divergence', early_cells=None)

Compute the degree of lineage priming.

This method computes how naive vs. committed each individual cell is. It returns a score where 0 stands for naive and 1 stands for committed.

Parameters
Returns

Return type

The priming degree.

compute_partition()

Compute communication classes for the Markov chain.

Returns

Nothing, but updates the following fields:

Return type

None

compute_schur(n_components=10, initial_distribution=None, method='krylov', which='LR', alpha=1)

Compute the Schur decomposition.

Parameters
• n_components (int) – Number of vectors to compute.

• initial_distribution (Optional[ndarray]) – Input probability distribution over all cells. If None, uniform is chosen.

• method (str) –

Method for calculating the Schur vectors. Valid options are: ‘krylov’ or ‘brandts’. For benefits of each method, see pygpcca.GPCCA.

The former is an iterative procedure that computes a partial, sorted Schur decomposition for large, sparse matrices whereas the latter computes a full sorted Schur decomposition of a dense matrix.

• which (str) – Eigenvalues are in general complex. ‘LR’ - largest real part, ‘LM’ - largest magnitude.

• alpha (float) – Used to compute the eigengap. alpha is the weight given to the deviation of an eigenvalue from one.

Returns

Nothing, but updates the following fields:

Return type

None

copy()

Return a copy of self, including the underlying adata object.

Return type

BaseEstimator

property eigendecomposition: Mapping[str, Any]

Eigendecomposition.

Return type
property is_irreducible

Whether the Markov chain is irreducible or not.

property issparse: bool

Whether the transition matrix is sparse or not.

Return type

bool

property kernel: cellrank.tl.kernels._base_kernel.KernelExpression

Underlying kernel.

Return type

KernelExpression

property lineage_absorption_times: pandas.core.frame.DataFrame

Lineage absorption times.

Return type

DataFrame

property lineage_drivers: pandas.core.frame.DataFrame

Lineage drivers.

Return type

DataFrame

property macrostates: pandas.core.series.Series

Macrostates.

Return type

Series

property macrostates_memberships: cellrank.tl._lineage.Lineage

Macrostates memberships.

Return type

Lineage

plot_absorption_probabilities(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_eigendecomposition(left=False, *args, **kwargs)

Plot eigenvectors in an embedding.

Parameters
• left (bool) – Whether to plot left or right eigenvectors.

• use – Which or how many vectors are to be plotted.

• abs_value – Whether to take the absolute value before plotting.

• cluster_key – Key in adata .obs for plotting categorical observations.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_lineage_drivers(lineage, n_genes=8, ncols=None, use_raw=False, title_fmt='{gene} qval={qval:.4e}', figsize=None, dpi=None, save=None, **kwargs)

Plot lineage drivers discovered by compute_lineage_drivers().

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_lineage_drivers_correlation(lineage_x, lineage_y, color=None, gene_sets=None, gene_sets_colors=None, use_raw=False, cmap='RdYlBu_r', fontsize=12, adjust_text=False, legend_loc='best', figsize=(4, 4), dpi=None, save=None, show=True, **kwargs)

Show scatter plot of gene-correlations between two lineages.

Optionally, you can pass a dict of gene names that will be annotated in the plot.

Parameters
Return type

Optional[Axes]

Returns

• matplotlib.pyplot.Axes – The axis object if show=False.

• None – Nothing, just plots the figure. Optionally saves it based on save.

Notes

This plot is based on the following notebook by Maren Büttner.

plot_macrostates(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_schur(vectors, prop, use=None, abs_value=False, cluster_key=None, **kwargs)

Plot vectors in an embedding.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_schur_matrix(title='schur matrix', cmap='viridis', figsize=None, dpi=80, save=None, **kwargs)

Plot the Schur matrix.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_spectrum(n=None, real_only=False, show_eigengap=True, show_all_xticks=True, legend_loc=None, title=None, figsize=(5, 5), dpi=100, save=None, marker='.', **kwargs)

Plot the top eigenvalues in real or complex plane.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_terminal_states(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

property priming_degree: pandas.core.series.Series

Priming degree.

Return type

Series

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property recurrent_classes

Recurrent classes of the Markov chain.

Rename the names of terminal_states.

Parameters
Returns

Nothing, just updates the names of terminal_states.

Return type

None

property schur: numpy.ndarray

Schur vectors.

Return type

ndarray

property schur_matrix: numpy.ndarray

Schur matrix.

Return type

ndarray

set_terminal_states(labels, cluster_key=None, en_cutoff=None, p_thresh=None, add_to_existing=False, **kwargs)

Manually define terminal states.

Parameters
• labels (Union[Series, Dict[str, Sequence[Any]]]) –

Defines the terminal states. Valid options are:

• categorical pandas.Series where each category corresponds to one terminal state. NaN entries denote cells that do not belong to any terminal state, i.e. these are either initial or transient cells.

• dict where keys are terminal states and values are lists of cell barcodes corresponding to annotations in adata .obs_names. If only 1 key is provided, values should correspond to terminal state clusters if a categorical pandas.Series can be found in adata .obs.

• cluster_key (Optional[str]) – Key from adata.obs where categorical cluster labels are stored. These are used to associate names and colors with each terminal state. Each terminal state will be given the name and color corresponding to the cluster it mostly overlaps with.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labeled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (Optional[float]) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

• add_to_existing (bool) – Whether the new terminal states should be added to pre-existing ones. Cells already assigned to a terminal state will be re-assigned to the new terminal state if there’s a conflict between old and new annotations. This throws an error if no previous annotations corresponding to terminal states have been found.

Returns

Nothing, but updates the following fields:

Return type

None

property terminal_states: pandas.core.series.Series

Terminal states.

Return type

Series

property terminal_states_probabilities: pandas.core.series.Series

Terminal states probabilities.

Return type

Series

property transient_classes

Transient classes of the Markov chain.

property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]

Transition matrix.

Return type

Union[ndarray, spmatrix]

write(fname, ext='pickle')

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

### CFLARE¶

Compute the initial/terminal states of a Markov chain via spectral heuristics.

This estimator uses the left eigenvectors of the transition matrix to filter to a set of recurrent cells and the right eigenvectors to cluster this set of cells into discrete groups.

Parameters
compute_terminal_states(use=None, percentile=98, method='kmeans', cluster_key=None, n_clusters_kmeans=None, n_neighbors=20, resolution=0.1, n_matches_min=0, n_neighbors_filtering=15, basis=None, n_comps=5, scale=False, en_cutoff=0.7, p_thresh=1e-15)[source]

Find approximate recurrent classes of the Markov chain.

Filter to obtain recurrent states in left eigenvectors. Cluster to obtain approximate recurrent classes in right eigenvectors.

Parameters
• use (Union[int, Tuple[int], List[int], range, None]) – Which or how many first eigenvectors to use as features for clustering/filtering. If None, use the eigengap statistic.

• percentile (Optional[int]) – Threshold used for filtering out cells which are most likely transient states. Cells which are in the lower percentile percent of each eigenvector will be removed from the data matrix.

• method (str) – Method to be used for clustering. Must be one of ‘louvain’, ‘leiden’ or ‘kmeans’.

• cluster_key (Optional[str]) – If a key to cluster labels is given, terminal_states will get associated with these for naming and colors.

• n_clusters_kmeans (Optional[int]) – If None, this is set to use + 1.

• n_neighbors (int) – If we use ‘louvain’ or ‘leiden’ for clustering cells, we need to build a KNN graph. This is the $$K$$ parameter for that, the number of neighbors for each cell.

• resolution (float) – Resolution parameter for ‘louvain’ or ‘leiden’ clustering. Should be chosen relatively small.

• n_matches_min (Optional[int]) – Filters out cells which don’t have at least n_matches_min neighbors from the same class. This filters out some cells which are transient but have been misassigned.

• n_neighbors_filtering (int) – Parameter for filtering cells. Cells are filtered out if they don’t have at least n_matches_min neighbors among their n_neighbors_filtering nearest cells.

• basis (Optional[str]) – Key from :paramrefadata .obsm to be used as additional features for the clustering.

• n_comps (int) – Number of embedding components to be use when basis is not None.

• scale (bool) – Scale to z-scores. Consider using this if appending embedding to features.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labeled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (float) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

Returns

Nothing, but updates the following fields:

Return type

None

fit(n_lineages, keys=None, cluster_key=None, compute_absorption_probabilities=True, **kwargs)[source]

Run the pipeline, computing the initial or terminal states and optionally the absorption probabilities.

It is equivalent to running:

compute_eigendecomposition(...)
compute_terminal_states(...)
compute_absorption_probabilities(...)

Parameters
Returns

Nothing, just makes available the following fields:

Return type

None

property absorption_probabilities: cellrank.tl._lineage.Lineage

Absorption probabilities.

Return type

Lineage

Annotated data object.

Returns

Annotated data object.

Return type

anndata.AnnData

compute_absorption_probabilities(keys=None, check_irreducibility=False, solver='gmres', use_petsc=True, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-06, preconditioner=None)

Compute absorption probabilities of a Markov chain.

For each cell, this computes the probability of it reaching any of the approximate recurrent classes defined by terminal_states.

Parameters
• keys (Optional[Sequence[str]]) – Keys defining the recurrent classes.

• check_irreducibility (bool) – Check whether the transition matrix is irreducible.

• solver (str) –

Solver to use for the linear problem. Options are ‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’ or ‘gcrotmk’ when use_petsc=False or one of petsc4py.PETSc.KPS.Type otherwise.

Information on the scipy iterative solvers can be found in scipy.sparse.linalg() or for petsc4py solver here.

• use_petsc (bool) – Whether to use solvers from petsc4py or scipy. Recommended for large problems. If no installation is found, defaults to scipy.sparse.linalg.gmres().

• Whether to compute mean time to absorption and its variance to specific absorbing states.

If a dict, can be specified as {'Alpha': 'var', ...} to also compute variance. In case when states are a tuple, time to absorption will be computed to the subset of these states, such as [('Alpha', 'Beta'), ...] or {('Alpha', 'Beta'): 'mean', ...}. Can be specified as 'all' to compute it to any absorbing state in keys, which is more efficient than listing all absorbing states.

It might be beneficial to disable the progress bar as show_progress_bar=False, because many linear systems are being solved.

• n_jobs (Optional[int]) – Number of parallel jobs to use when using an iterative solver. When use_petsc=True or for quickly-solvable problems, we recommend higher number (>=8) of jobs in order to fully saturate the cores.

• backend (str) – Which backend to use for multiprocessing. See joblib.Parallel for valid options.

• show_progress_bar (bool) – Whether to show progress bar when the solver isn’t a direct one.

• tol (float) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.

• preconditioner (Optional[str]) – Preconditioner to use, only available when use_petsc=True. For available values, see here or the values of petsc4py.PETSc.PC.Type. We recommended ‘ilu’ preconditioner for badly conditioned problems.

Returns

Nothing, but updates the following fields:

• absorption_probabilities - probabilities of being absorbed into the terminal states.

• lineage_absorption_times - mean times until absorption to subset absorbing states and optionally their variances saved as '{lineage} mean' and '{lineage} var', respectively, for each subset of absorbing states specified in time_to_absorption.

Return type

None

compute_eigendecomposition(k=20, which='LR', alpha=1, only_evals=False, ncv=None)

Compute eigendecomposition of transition matrix.

Uses a sparse implementation, if possible, and only computes the top $$k$$ eigenvectors to speed up the computation. Computes both left and right eigenvectors.

Parameters
Returns

Nothing, but updates the following field:

Return type

None

compute_lineage_drivers(lineages=None, method='fischer', cluster_key=None, clusters=None, layer='X', use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, return_drivers=True, **kwargs)

Compute driver genes per lineage.

Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.

Parameters
Return type

Optional[DataFrame]

Returns

• Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, 1 for each lineage –

• {lineage} corr - correlation between the gene expression and absorption probabilities.

• {lineage} pval - calculated p-values for double-sided test.

• {lineage} qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

• {lineage} ci low - lower bound of the confidence_level correlation confidence interval.

• {lineage} ci high - upper bound of the confidence_level correlation confidence interval.

• Only if return_drivers=True.

• Otherwise, updates adata .var or adata .raw.var, depending use_raw with –

• '{direction} {lineage} corr' - the potential lineage drivers.

• '{direction} {lineage} qval' - the corrected p-values.

• Also updates the following fields

compute_lineage_priming(method='kl_divergence', early_cells=None)

Compute the degree of lineage priming.

This method computes how naive vs. committed each individual cell is. It returns a score where 0 stands for naive and 1 stands for committed.

Parameters
Returns

Return type

The priming degree.

compute_partition()

Compute communication classes for the Markov chain.

Returns

Nothing, but updates the following fields:

Return type

None

copy()

Return a copy of self, including the underlying adata object.

Return type

BaseEstimator

property eigendecomposition: Mapping[str, Any]

Eigendecomposition.

Return type
property is_irreducible

Whether the Markov chain is irreducible or not.

property issparse: bool

Whether the transition matrix is sparse or not.

Return type

bool

property kernel: cellrank.tl.kernels._base_kernel.KernelExpression

Underlying kernel.

Return type

KernelExpression

property lineage_absorption_times: pandas.core.frame.DataFrame

Lineage absorption times.

Return type

DataFrame

property lineage_drivers: pandas.core.frame.DataFrame

Lineage drivers.

Return type

DataFrame

plot_absorption_probabilities(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_eigendecomposition(left=False, *args, **kwargs)

Plot eigenvectors in an embedding.

Parameters
• left (bool) – Whether to plot left or right eigenvectors.

• use – Which or how many vectors are to be plotted.

• abs_value – Whether to take the absolute value before plotting.

• cluster_key – Key in adata .obs for plotting categorical observations.

• basis – Basis to use when mode='embedding'. If None, use ‘umap’.

• kwargs – Keyword arguments for scvelo.pl.scatter().

Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_lineage_drivers(lineage, n_genes=8, ncols=None, use_raw=False, title_fmt='{gene} qval={qval:.4e}', figsize=None, dpi=None, save=None, **kwargs)

Plot lineage drivers discovered by compute_lineage_drivers().

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_lineage_drivers_correlation(lineage_x, lineage_y, color=None, gene_sets=None, gene_sets_colors=None, use_raw=False, cmap='RdYlBu_r', fontsize=12, adjust_text=False, legend_loc='best', figsize=(4, 4), dpi=None, save=None, show=True, **kwargs)

Show scatter plot of gene-correlations between two lineages.

Optionally, you can pass a dict of gene names that will be annotated in the plot.

Parameters
Return type

Optional[Axes]

Returns

• matplotlib.pyplot.Axes – The axis object if show=False.

• None – Nothing, just plots the figure. Optionally saves it based on save.

Notes

This plot is based on the following notebook by Maren Büttner.

plot_spectrum(n=None, real_only=False, show_eigengap=True, show_all_xticks=True, legend_loc=None, title=None, figsize=(5, 5), dpi=100, save=None, marker='.', **kwargs)

Plot the top eigenvalues in real or complex plane.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_terminal_states(data, prop, discrete=False, lineages=None, cluster_key=None, mode='embedding', time_key='latent_time', title=None, same_plot=False, cmap='viridis', **kwargs)

Plot discrete states or probabilities in an embedding.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

property priming_degree: pandas.core.series.Series

Priming degree.

Return type

Series

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property recurrent_classes

Recurrent classes of the Markov chain.

Rename the names of terminal_states.

Parameters
Returns

Nothing, just updates the names of terminal_states.

Return type

None

set_terminal_states(labels, cluster_key=None, en_cutoff=None, p_thresh=None, add_to_existing=False, **kwargs)

Manually define terminal states.

Parameters
• labels (Union[Series, Dict[str, Sequence[Any]]]) –

Defines the terminal states. Valid options are:

• categorical pandas.Series where each category corresponds to one terminal state. NaN entries denote cells that do not belong to any terminal state, i.e. these are either initial or transient cells.

• dict where keys are terminal states and values are lists of cell barcodes corresponding to annotations in adata .obs_names. If only 1 key is provided, values should correspond to terminal state clusters if a categorical pandas.Series can be found in adata .obs.

• cluster_key (Optional[str]) – Key from adata.obs where categorical cluster labels are stored. These are used to associate names and colors with each terminal state. Each terminal state will be given the name and color corresponding to the cluster it mostly overlaps with.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labeled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (Optional[float]) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

• add_to_existing (bool) – Whether the new terminal states should be added to pre-existing ones. Cells already assigned to a terminal state will be re-assigned to the new terminal state if there’s a conflict between old and new annotations. This throws an error if no previous annotations corresponding to terminal states have been found.

Returns

Nothing, but updates the following fields:

Return type

None

property terminal_states: pandas.core.series.Series

Terminal states.

Return type

Series

property terminal_states_probabilities: pandas.core.series.Series

Terminal states probabilities.

Return type

Series

property transient_classes

Transient classes of the Markov chain.

property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]

Transition matrix.

Return type

Union[ndarray, spmatrix]

write(fname, ext='pickle')

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

## Kernels¶

### Velocity Kernel¶

class cellrank.tl.kernels.VelocityKernel(adata, backward=False, vkey='velocity', xkey='Ms', gene_subset=None, compute_cond_num=False, check_connectivity=False, **kwargs)[source]

Kernel which computes a transition matrix based on RNA velocity.

This borrows ideas from both and . In short, for each cell i, we compute transition probabilities $$p_{i, j}$$ to each cell j in the neighborhood of i. The transition probabilities are computed as a multinomial logistic regression where the weights $$w_j$$ (for all j) are given by the vector that connects cell i with cell j in gene expression space, and the features $$x_i$$ are given by the velocity vector $$v_i$$ of cell i.

Parameters
compute_transition_matrix(mode='deterministic', backward_mode='transpose', scheme='correlation', softmax_scale=None, n_samples=1000, seed=None, check_irreducibility=False, **kwargs)[source]

Compute transition matrix based on velocity directions on the local manifold.

For each cell, infer transition probabilities based on the cell’s velocity-extrapolated cell state and the cell states of its K nearest neighbors.

Parameters
Returns

Makes available the following fields:

Return type

cellrank.tl.kernels.VelocityKernel

property logits: scipy.sparse.csr.csr_matrix

Array of shape (n_cells, n_cells) containing the logits.

Return type

csr_matrix

copy()[source]

Return a copy of self.

Return type

VelocityKernel

#### Cosine Similarity Scheme¶

class cellrank.tl.kernels.CosineScheme[source]

Cosine similarity scheme as defined in eq. (4.7) .

$$v(s_i, s_j) = g(cos(\delta_{i, j}, v_i))$$

where $$v_i$$ is the velocity vector of cell $$i$$, $$\delta_{i, j}$$ corresponds to the transcriptional displacement between cells $$i$$ and $$j$$ and $$g$$ is a softmax function with some scaling parameter.

__call__(v, D, softmax_scale=1.0)

Compute transition probability of a cell to its nearest neighbors using RNA velocity.

Parameters
• v (ndarray) – Array of shape (n_genes,) or (n_neighbors, n_genes) containing the velocity vector(s). The second case is used for the backward process.

• D (ndarray) – Array of shape (n_neighbors, n_genes) corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.

• softmax_scale (float) – Scaling factor for the softmax function.

Returns

The probability and logits arrays of shape (n_neighbors,).

Return type
hessian(v, D, softmax_scale=1.0)

Compute the Hessian.

Parameters
• v (ndarray) – Array of shape (n_genes,) containing the velocity vector.

• D (ndarray) – Array of shape (n_neighbors, n_genes) corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.

• softmax_scale (float) – Scaling factor for the softmax function.

Returns

The full Hessian of shape (n_neighbors, n_genes, n_genes) or only its diagonal of shape (n_neighbors, n_genes).

Return type

numpy.ndarray

#### Correlation Scheme¶

class cellrank.tl.kernels.CorrelationScheme[source]

Pearson correlation scheme as defined in eq. (4.8) .

$$v(s_i, s_j) = g(corr(\delta_{i, j}, v_i))$$

where $$v_i$$ is the velocity vector of cell $$i$$, $$\delta_{i, j}$$ corresponds to the transcriptional displacement between cells $$i$$ and $$j$$ and $$g$$ is a softmax function with some scaling parameter.

__call__(v, D, softmax_scale=1.0)

Compute transition probability of a cell to its nearest neighbors using RNA velocity.

Parameters
• v (ndarray) – Array of shape (n_genes,) or (n_neighbors, n_genes) containing the velocity vector(s). The second case is used for the backward process.

• D (ndarray) – Array of shape (n_neighbors, n_genes) corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.

• softmax_scale (float) – Scaling factor for the softmax function.

Returns

The probability and logits arrays of shape (n_neighbors,).

Return type
hessian(v, D, softmax_scale=1.0)

Compute the Hessian.

Parameters
• v (ndarray) – Array of shape (n_genes,) containing the velocity vector.

• D (ndarray) – Array of shape (n_neighbors, n_genes) corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.

• softmax_scale (float) – Scaling factor for the softmax function.

Returns

The full Hessian of shape (n_neighbors, n_genes, n_genes) or only its diagonal of shape (n_neighbors, n_genes).

Return type

numpy.ndarray

#### Dot Product Scheme¶

class cellrank.tl.kernels.DotProductScheme[source]

Dot product scheme as defined in eq. (4.9) .

$$v(s_i, s_j) = g(\delta_{i, j}^T v_i)$$

where $$v_i$$ is the velocity vector of cell $$i$$, $$\delta_{i, j}$$ corresponds to the transcriptional displacement between cells $$i$$ and $$j$$ and $$g$$ is a softmax function with some scaling parameter.

__call__(v, D, softmax_scale=1.0)

Compute transition probability of a cell to its nearest neighbors using RNA velocity.

Parameters
• v (ndarray) – Array of shape (n_genes,) or (n_neighbors, n_genes) containing the velocity vector(s). The second case is used for the backward process.

• D (ndarray) – Array of shape (n_neighbors, n_genes) corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.

• softmax_scale (float) – Scaling factor for the softmax function.

Returns

The probability and logits arrays of shape (n_neighbors,).

Return type
hessian(v, D, softmax_scale=1.0)

Compute the Hessian.

Parameters
• v (ndarray) – Array of shape (n_genes,) containing the velocity vector.

• D (ndarray) – Array of shape (n_neighbors, n_genes) corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.

• softmax_scale (float) – Scaling factor for the softmax function.

Returns

The full Hessian of shape (n_neighbors, n_genes, n_genes) or only its diagonal of shape (n_neighbors, n_genes).

Return type

numpy.ndarray

### Connectivity Kernel¶

class cellrank.tl.kernels.ConnectivityKernel(adata, backward=False, conn_key='connectivities', compute_cond_num=False, check_connectivity=False)[source]

Kernel which computes transition probabilities based on similarities among cells.

As a measure of similarity, we currently support:

The resulting transition matrix is symmetric and thus cannot be used to learn about the direction of the biological process. To include this direction, consider combining with a velocity-derived transition matrix via cellrank.tl.kernels.VelocityKernel.

Optionally, we apply a density correction as described in , where we use the implementation of .

Parameters
compute_transition_matrix(density_normalize=True)[source]

Compute transition matrix based on transcriptomic similarity.

Uses symmetric, weighted KNN graph to compute symmetric transition matrix. The connectivities are computed using scanpy.pp.neighbors(). Depending on the parameters used there, they can be UMAP connectivities or gaussian-kernel-based connectivities with adaptive kernel width.

Parameters

density_normalize (bool) – Whether or not to use the underlying KNN graph for density normalization.

Returns

Makes transition_matrix available.

Return type

cellrank.tl.kernels.ConnectivityKernel

copy()[source]

Return a copy of self.

Return type

ConnectivityKernel

### Pseudotime Kernel¶

class cellrank.tl.kernels.PseudotimeKernel(adata, backward=False, time_key='dpt_pseudotime', compute_cond_num=False, check_connectivity=False, **kwargs)[source]

Kernel which computes directed transition probabilities based on a KNN graph and pseudotime.

The KNN graph contains information about the (undirected) connectivities among cells, reflecting their similarity. Pseudotime can be used to either remove edges that point against the direction of increasing pseudotime , or to downweight them .

Parameters
compute_transition_matrix(threshold_scheme='hard', frac_to_keep=0.3, b=10.0, nu=0.5, check_irreducibility=False, n_jobs=None, backend='loky', show_progress_bar=True, **kwargs)[source]

Compute transition matrix based on KNN graph and pseudotemporal ordering.

Depending on the choice of the thresholding_scheme, this is based on ideas by either Palantir or VIA .

When using a ‘hard’ thresholding scheme, this based on ideas by Palantir which removes some edges that point against the direction of increasing pseudotime. To avoid disconnecting the graph, it does not remove all edges that point against the direction of increasing pseudotime but keeps the ones that point to cells inside a close radius. This radius is chosen according to the local cell density.

When using a ‘soft’ thresholding scheme, this is based on ideas by VIA which downweights edges that points against the direction of increasing pseudotime. Essentially, the further “behind” a query cell is in pseudotime with respect to the current reference cell, the more penalized will be its graph-connectivity.

Parameters
• frac_to_keep (float) – The frac_to_keep * n_neighbors closest neighbors (according to graph connectivities) are kept, no matter whether they lie in the pseudotemporal past or future. This is done to ensure that the graph remains connected. Only used when threshold_scheme=’hard’. frac_to_keep needs to fall within the interval [0, 1].

• b (float) – The growth rate of generalized logistic function. Only used when threshold_scheme=’soft’.

• nu (float) – Affects near which asymptote maximum growth occurs. Only used when threshold_scheme=’soft’.

• check_irreducibility (bool) – Optional check for irreducibility of the final transition matrix.

• show_progress_bar (bool) – Whether to show a progress bar. Disabling it may slightly improve performance.

• n_jobs (Optional[int]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

• backend (str) – Which backend to use for parallelization. See joblib.Parallel for valid options.

• kwargs (Any) – Keyword arguments for threshold_scheme.

Returns

Makes transition_matrix available.

Return type

cellrank.tl.kernels.PseudotimeKernel

property pseudotime: numpy.array

Pseudotemporal ordering of cells.

Return type

array

copy()[source]

Return a copy of self.

Return type

PseudotimeKernel

#### Hard Threshold Scheme¶

class cellrank.tl.kernels.HardThresholdScheme[source]

Thresholding scheme inspired by Palantir .

Note that this won’t exactly reproduce the original Palantir results, for three reasons:

• Palantir computes the KNN graph in a scaled space of diffusion components.

• Palantir uses its own pseudotime to bias the KNN graph which is not implemented here.

• Palantir uses a slightly different mechanism to ensure the graph remains connected when removing edges that point into the “pseudotime past”.

__call__(cell_pseudotime, neigh_pseudotime, neigh_conn, n_neighs, frac_to_keep=0.3)[source]

Convert the undirected graph of cell-cell similarities into a directed one by removing “past” edges.

This uses a pseudotemporal measure to remove graph-edges that point into the pseudotime-past. For each cell, it keeps the closest neighbors, even if they are in the pseudotime past, to make sure the graph remains connected.

Parameters
• cell_pseudotime (float) – Pseudotime of the current cell.

• neigh_pseudotime (ndarray) – Array of shape (n_neighbors,) containing pseudotimes of neighbors.

• neigh_conn (ndarray) – Array of shape (n_neighbors,) containing connectivities of the current cell and its neighbors.

• n_neighs (int) – Number of neighbors to keep.

• frac_to_keep (float) – The frac_to_keep * n_neighbors closest neighbors (according to graph connectivities) are kept, no matter whether they lie in the pseudotemporal past or future. frac_to_keep needs to fall within the interval [0, 1].

Returns

Return type

Array of shape (n_neighbors,) containing the biased connectivities.

#### Soft Threshold Scheme¶

class cellrank.tl.kernels.SoftThresholdScheme[source]

Thresholding scheme inspired by .

The idea is to downweight edges that points against the direction of increasing pseudotime. Essentially, the further “behind” a query cell is in pseudotime with respect to the current reference cell, the more penalized will be its graph-connectivity.

__call__(cell_pseudotime, neigh_pseudotime, neigh_conn, b=10.0, nu=0.5)[source]

Bias the connectivities by downweighting ones to past cells.

This function uses generalized logistic regression to weight the past connectivities.

Parameters
• cell_pseudotime (float) – Pseudotime of the current cell.

• neigh_pseudotime (ndarray) – Array of shape (n_neighbors,) containing pseudotimes of neighbors.

• neigh_conn (ndarray) – Array of shape (n_neighbors,) containing connectivities of the current cell and its neighbors.

• b (float) – The growth rate of generalized logistic function.

• nu (float) – Affects near which asymptote maximum growth occurs.

Returns

Return type

Array of shape (n_neighbors,) containing the biased connectivities.

### CytoTRACE Kernel¶

class cellrank.tl.kernels.CytoTRACEKernel(adata, backward=False, layer='Ms', aggregation='mean', use_raw=False, compute_cond_num=False, check_connectivity=False, **kwargs)[source]

Kernel which computes directed transition probabilities based on a KNN graph and the CytoTRACE score .

The KNN graph contains information about the (undirected) connectivities among cells, reflecting their similarity. CytoTRACE can be used to estimate cellular plasticity and in turn, a pseudotemporal ordering of cells from more plastic to less plastic states. This kernel internally uses the cellrank.tl.kernels.PseudotimeKernel to direct the KNN graph on the basis of the CytoTRACE-derived pseudotime.

Optionally, we apply a density correction as described in , where we use the implementation of .

Parameters

Example

Workflow:

# import packages and load data
import scvelo as scv
import cellrank as cr

# standard pre-processing

# CytoTRACE by default uses imputed data - a simple way to compute KNN-imputed data is to use scVelo's moments
# function. However, note that this function expects spliced counts because it's designed for RNA velocity,
# so we're using a simple hack here:

# compute KNN-imputation using scVelo's moments function

# import and initialize the CytoTRACE kernel, compute transition matrix - done!
from cellrank.tl.kernels import CytoTRACEKernel

compute_cytotrace(layer='Ms', aggregation='mean', use_raw=False)[source]

Re-implementation of the CytoTRACE algorithm to estimate cellular plasticity.

Computes the number of genes expressed per cell and ranks genes according to their correlation with this measure. Next, it selects to top-correlating genes and aggregates their (imputed) expression to obtain the CytoTRACE score. A high score stands for high differentiation potential (naive, plastic cells) and a low score stands for low differentiation potential (mature, differentiation cells).

Parameters
Return type

None

Returns

• Nothing, just modifies anndata.AnnData.obs with the following keys –

• ‘ct_score’: the normalized CytoTRACE score.

• ’ct_pseudotime’: associated pseudotime, essentially 1 - CytoTRACE score.

• ’ct_num_exp_genes’: the number of genes expressed per cell, basis of the CytoTRACE score.

• It also modifies anndata.AnnData.var with the following keys –

• ‘ct_gene_corr’: the correlation as specified above.

• ’ct_correlates’: indication of the genes used to compute the CytoTRACE score, i.e. the ones that correlated best with ‘num_exp_genes’.

Notes

This will not exactly reproduce the results of the original CytoTRACE algorithm because we allow for any normalization and imputation techniques whereas CytoTRACE has built-in specific methods for that.

compute_transition_matrix(threshold_scheme='hard', frac_to_keep=0.3, b=10.0, nu=0.5, check_irreducibility=False, n_jobs=None, backend='loky', show_progress_bar=True, **kwargs)

Compute transition matrix based on KNN graph and pseudotemporal ordering.

Depending on the choice of the thresholding_scheme, this is based on ideas by either Palantir or VIA .

When using a ‘hard’ thresholding scheme, this based on ideas by Palantir which removes some edges that point against the direction of increasing pseudotime. To avoid disconnecting the graph, it does not remove all edges that point against the direction of increasing pseudotime but keeps the ones that point to cells inside a close radius. This radius is chosen according to the local cell density.

When using a ‘soft’ thresholding scheme, this is based on ideas by VIA which downweights edges that points against the direction of increasing pseudotime. Essentially, the further “behind” a query cell is in pseudotime with respect to the current reference cell, the more penalized will be its graph-connectivity.

Parameters
• frac_to_keep (float) – The frac_to_keep * n_neighbors closest neighbors (according to graph connectivities) are kept, no matter whether they lie in the pseudotemporal past or future. This is done to ensure that the graph remains connected. Only used when threshold_scheme=’hard’. frac_to_keep needs to fall within the interval [0, 1].

• b (float) – The growth rate of generalized logistic function. Only used when threshold_scheme=’soft’.

• nu (float) – Affects near which asymptote maximum growth occurs. Only used when threshold_scheme=’soft’.

• check_irreducibility (bool) – Optional check for irreducibility of the final transition matrix.

• show_progress_bar (bool) – Whether to show a progress bar. Disabling it may slightly improve performance.

• n_jobs (Optional[int]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

• backend (str) – Which backend to use for parallelization. See joblib.Parallel for valid options.

• kwargs (Any) – Keyword arguments for threshold_scheme.

Returns

Makes transition_matrix available.

Return type

cellrank.tl.kernels.PseudotimeKernel

### Precomputed Kernel¶

class cellrank.tl.kernels.PrecomputedKernel(transition_matrix=None, adata=None, backward=False, compute_cond_num=False, **kwargs)[source]

Kernel which contains a precomputed transition matrix.

Parameters
copy()[source]

Return a copy of self.

Return type

PrecomputedKernel

compute_transition_matrix(*args, **kwargs)[source]

Return self.

Return type

PrecomputedKernel

## Models¶

### GAM¶

Fit Generalized Additive Models (GAMs) using pygam.

Parameters
fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters
Returns

Fits the model and returns self.

Return type

cellrank.ul.models.GAM

Run the prediction.

Parameters
Returns

Return type

numpy.ndarray

Annotated data object.

Returns

Return type

anndata.AnnData

property conf_int: numpy.ndarray

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

ndarray

confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

numpy.ndarray

default_confidence_interval(x_test=None, **kwargs)

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from , eq. 5.

Parameters
Returns

Return type

numpy.ndarray

property model: Any

The underlying model.

Return type

Any

plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)

Plot the smoothed gene expression.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)

Prepare the model to be ready for fitting.

Parameters
Returns

Nothing, but updates the following fields:

Return type

None

property prepared

Whether the model is prepared for fitting.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property w: numpy.ndarray

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

ndarray

property w_all: numpy.ndarray

Unfiltered weights of shape (n_cells,).

Return type

ndarray

write(fname, ext='pickle')

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

property x: numpy.ndarray

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property x_all: numpy.ndarray

Unfiltered independent variables of shape (n_cells, 1).

Return type

ndarray

property x_hat: numpy.ndarray

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type

ndarray

property x_test: numpy.ndarray

Independent variables of shape (n_samples, 1) used for prediction.

Return type

ndarray

property y: numpy.ndarray

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property y_all: numpy.ndarray

Unfiltered dependent variables of shape (n_cells, 1).

Return type

ndarray

property y_hat: numpy.ndarray

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

ndarray

property y_test: numpy.ndarray

Prediction values of shape (n_samples,) for x_test.

Return type

ndarray

copy()[source]

Return a copy of self.

Return type

BaseModel

### SKLearnModel¶

Wrapper around sklearn.base.BaseEstimator.

Parameters
fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters
Returns

Fits the model and returns self.

Return type

cellrank.ul.models.SKLearnModel

Run the prediction.

Parameters
Returns

Return type

numpy.ndarray

confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval.

Use default_confidence_interval() function if underlying model has not method for confidence interval calculation.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

numpy.ndarray

property model: sklearn.base.BaseEstimator

The underlying sklearn.base.BaseEstimator.

Return type

BaseEstimator

copy()[source]

Return a copy of self.

Return type

SKLearnModel

Annotated data object.

Returns

Return type

anndata.AnnData

property conf_int: numpy.ndarray

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

ndarray

default_confidence_interval(x_test=None, **kwargs)

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from , eq. 5.

Parameters
Returns

Return type

numpy.ndarray

plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)

Plot the smoothed gene expression.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)

Prepare the model to be ready for fitting.

Parameters
Returns

Nothing, but updates the following fields:

Return type

None

property prepared

Whether the model is prepared for fitting.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property w: numpy.ndarray

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

ndarray

property w_all: numpy.ndarray

Unfiltered weights of shape (n_cells,).

Return type

ndarray

write(fname, ext='pickle')

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

property x: numpy.ndarray

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property x_all: numpy.ndarray

Unfiltered independent variables of shape (n_cells, 1).

Return type

ndarray

property x_hat: numpy.ndarray

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type

ndarray

property x_test: numpy.ndarray

Independent variables of shape (n_samples, 1) used for prediction.

Return type

ndarray

property y: numpy.ndarray

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property y_all: numpy.ndarray

Unfiltered dependent variables of shape (n_cells, 1).

Return type

ndarray

property y_hat: numpy.ndarray

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

ndarray

property y_test: numpy.ndarray

Prediction values of shape (n_samples,) for x_test.

Return type

ndarray

### GAMR¶

class cellrank.ul.models.GAMR(adata, n_knots=5, distribution='gaussian', basis='cr', knotlocs='auto', offset='default', smoothing_penalty=1.0, **kwargs)[source]

Wrapper around R’s mgcv package for fitting Generalized Additive Models (GAMs).

Parameters
prepare(*args, **kwargs)[source]

Prepare the model to be ready for fitting. This also removes the zero and negative weights and prepares the design matrix.

Parameters
• gene – Gene in adata .var_names or in adata .raw.var_names.

• lineage – Name of a lineage in adata .obsm['{lineage_key}']. If None, all weights will be set to 1.

• backward – Direction of the process.

• time_range

Specify start and end times:

• If a tuple, it specifies the minimum and maximum pseudotime. Both values can be None, in which case the minimum is the earliest pseudotime and the maximum is automatically determined.

• If a float, it specifies the maximum pseudotime.

• data_key – Key in adata .layers or ‘X’ for adata .X. If use_raw=True, it’s always set to ‘X’.

• time_key – Key in adata .obs where the pseudotime is stored.

• use_raw – Whether to access adata .raw or not.

• threshold – Consider only cells with weights > threshold when estimating the test endpoint. If None, use the median of the weights.

• weight_threshold – Set all weights below weight_threshold to weight_threshold if a float, or to the second value, if a tuple.

• filter_cells – Filter out all cells with expression values lower than this threshold.

• n_test_points – Number of test points. If None, use the original points based on threshold.

Returns

Nothing, but updates the following fields:

Return type

None

fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters
Returns

Fits the model and returns self. Updates the following fields by filtering out 0 weights w:

• x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

• y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

• w - Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

cellrank.ul.models.GAMR

Run the prediction. This method can also compute the confidence interval.

Parameters
Returns

Return type

numpy.ndarray

confidence_interval(x_test=None, level=0.95, **kwargs)[source]

Calculate the confidence interval. Internally, this method calls cellrank.ul.models.GAMR.predict() to extract the confidence interval, if needed.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

numpy.ndarray

copy()[source]

Return a copy of self.

Return type

GAMR

Annotated data object.

Returns

Return type

anndata.AnnData

property conf_int: numpy.ndarray

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

ndarray

default_confidence_interval(x_test=None, **kwargs)

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from , eq. 5.

Parameters
Returns

Return type

numpy.ndarray

property model: Any

The underlying model.

Return type

Any

plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)

Plot the smoothed gene expression.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

property prepared

Whether the model is prepared for fitting.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property w: numpy.ndarray

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

ndarray

property w_all: numpy.ndarray

Unfiltered weights of shape (n_cells,).

Return type

ndarray

write(fname, ext='pickle')

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

property x: numpy.ndarray

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property x_all: numpy.ndarray

Unfiltered independent variables of shape (n_cells, 1).

Return type

ndarray

property x_hat: numpy.ndarray

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type

ndarray

property x_test: numpy.ndarray

Independent variables of shape (n_samples, 1) used for prediction.

Return type

ndarray

property y: numpy.ndarray

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property y_all: numpy.ndarray

Unfiltered dependent variables of shape (n_cells, 1).

Return type

ndarray

property y_hat: numpy.ndarray

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

ndarray

property y_test: numpy.ndarray

Prediction values of shape (n_samples,) for x_test.

Return type

ndarray

## Base Classes¶

### BaseEstimator¶

Base class for all estimators.

Parameters
set_terminal_states(labels, cluster_key=None, en_cutoff=None, p_thresh=None, add_to_existing=False, **kwargs)[source]

Manually define terminal states.

Parameters
• labels (Union[Series, Dict[str, Sequence[Any]]]) –

Defines the terminal states. Valid options are:

• categorical pandas.Series where each category corresponds to one terminal state. NaN entries denote cells that do not belong to any terminal state, i.e. these are either initial or transient cells.

• dict where keys are terminal states and values are lists of cell barcodes corresponding to annotations in adata .obs_names. If only 1 key is provided, values should correspond to terminal state clusters if a categorical pandas.Series can be found in adata .obs.

• cluster_key (Optional[str]) – Key from adata.obs where categorical cluster labels are stored. These are used to associate names and colors with each terminal state. Each terminal state will be given the name and color corresponding to the cluster it mostly overlaps with.

• en_cutoff (Optional[float]) – If cluster_key is given, this parameter determines when an approximate recurrent class will be labeled as ‘Unknown’, based on the entropy of the distribution of cells over transcriptomic clusters.

• p_thresh (Optional[float]) – If cell cycle scores were provided, a Wilcoxon rank-sum test is conducted to identify cell-cycle states. If the test returns a positive statistic and a p-value smaller than p_thresh, a warning will be issued.

• add_to_existing (bool) – Whether the new terminal states should be added to pre-existing ones. Cells already assigned to a terminal state will be re-assigned to the new terminal state if there’s a conflict between old and new annotations. This throws an error if no previous annotations corresponding to terminal states have been found.

Returns

Nothing, but updates the following fields:

• terminal_states_probabilities

• terminal_states

Return type

None

Rename the names of terminal_states.

Parameters
Returns

Nothing, just updates the names of terminal_states.

Return type

None

compute_absorption_probabilities(keys=None, check_irreducibility=False, solver='gmres', use_petsc=True, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-06, preconditioner=None)[source]

Compute absorption probabilities of a Markov chain.

For each cell, this computes the probability of it reaching any of the approximate recurrent classes defined by terminal_states.

Parameters
• keys (Optional[Sequence[str]]) – Keys defining the recurrent classes.

• check_irreducibility (bool) – Check whether the transition matrix is irreducible.

• solver (str) –

Solver to use for the linear problem. Options are ‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’ or ‘gcrotmk’ when use_petsc=False or one of petsc4py.PETSc.KPS.Type otherwise.

Information on the scipy iterative solvers can be found in scipy.sparse.linalg() or for petsc4py solver here.

• use_petsc (bool) – Whether to use solvers from petsc4py or scipy. Recommended for large problems. If no installation is found, defaults to scipy.sparse.linalg.gmres().

• Whether to compute mean time to absorption and its variance to specific absorbing states.

If a dict, can be specified as {'Alpha': 'var', ...} to also compute variance. In case when states are a tuple, time to absorption will be computed to the subset of these states, such as [('Alpha', 'Beta'), ...] or {('Alpha', 'Beta'): 'mean', ...}. Can be specified as 'all' to compute it to any absorbing state in keys, which is more efficient than listing all absorbing states.

It might be beneficial to disable the progress bar as show_progress_bar=False, because many linear systems are being solved.

• n_jobs (Optional[int]) – Number of parallel jobs to use when using an iterative solver. When use_petsc=True or for quickly-solvable problems, we recommend higher number (>=8) of jobs in order to fully saturate the cores.

• backend (str) – Which backend to use for multiprocessing. See joblib.Parallel for valid options.

• show_progress_bar (bool) – Whether to show progress bar when the solver isn’t a direct one.

• tol (float) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.

• preconditioner (Optional[str]) – Preconditioner to use, only available when use_petsc=True. For available values, see here or the values of petsc4py.PETSc.PC.Type. We recommended ‘ilu’ preconditioner for badly conditioned problems.

Returns

Nothing, but updates the following fields:

• absorption_probabilities - probabilities of being absorbed into the terminal states.

• lineage_absorption_times - mean times until absorption to subset absorbing states and optionally their variances saved as '{lineage} mean' and '{lineage} var', respectively, for each subset of absorbing states specified in time_to_absorption.

Return type

None

compute_lineage_priming(method='kl_divergence', early_cells=None)[source]

Compute the degree of lineage priming.

This method computes how naive vs. committed each individual cell is. It returns a score where 0 stands for naive and 1 stands for committed.

Parameters
Returns

Return type

The priming degree.

compute_lineage_drivers(lineages=None, method='fischer', cluster_key=None, clusters=None, layer='X', use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, return_drivers=True, **kwargs)[source]

Compute driver genes per lineage.

Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.

Parameters
• lineages (Union[str, Sequence, None]) – Either a set of lineage names from absorption_probabilities .names or None, in which case all lineages are considered.

• method (str) –

Mode to use when calculating p-values and confidence intervals. Valid options are:

• ’fischer’ - use Fischer transformation .

• ’perm_test’ - use permutation test.

• cluster_key (Optional[str]) – Key from adata .obs to obtain cluster annotations. These are considered for clusters.

• clusters (Union[str, Sequence, None]) – Restrict the correlations to these clusters.

• layer (str) – Key from adata .layers.

• use_raw (bool) – Whether or not to use adata .raw to correlate gene expression. If using a layer other than .X, this must be set to False.

• confidence_level (float) – Confidence level for the confidence interval calculation. Must be in [0, 1].

• n_perms (int) – Number of permutations to use when method='perm_test'.

• seed (Optional[int]) – Random seed when method='perm_test'.

• return_drivers (bool) – Whether to return the drivers. This also contains the lower and upper confidence_level confidence interval bounds.

• show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.

• n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

• backend – Which backend to use for parallelization. See joblib.Parallel for valid options.

Return type

Optional[DataFrame]

Returns

• Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, 1 for each lineage –

• {lineage} corr - correlation between the gene expression and absorption probabilities.

• {lineage} pval - calculated p-values for double-sided test.

• {lineage} qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

• {lineage} ci low - lower bound of the confidence_level correlation confidence interval.

• {lineage} ci high - upper bound of the confidence_level correlation confidence interval.

• Only if return_drivers=True.

• Otherwise, updates adata .var or adata .raw.var, depending use_raw with –

• '{direction} {lineage} corr' - the potential lineage drivers.

• '{direction} {lineage} qval' - the corrected p-values.

• Also updates the following fields

• lineage_drivers - same as the returned values.

plot_lineage_drivers(lineage, n_genes=8, ncols=None, use_raw=False, title_fmt='{gene} qval={qval:.4e}', figsize=None, dpi=None, save=None, **kwargs)[source]

Plot lineage drivers discovered by compute_lineage_drivers().

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

plot_lineage_drivers_correlation(lineage_x, lineage_y, color=None, gene_sets=None, gene_sets_colors=None, use_raw=False, cmap='RdYlBu_r', fontsize=12, adjust_text=False, legend_loc='best', figsize=(4, 4), dpi=None, save=None, show=True, **kwargs)[source]

Show scatter plot of gene-correlations between two lineages.

Optionally, you can pass a dict of gene names that will be annotated in the plot.

Parameters
Return type

Optional[Axes]

Returns

• matplotlib.pyplot.Axes – The axis object if show=False.

• None – Nothing, just plots the figure. Optionally saves it based on save.

Notes

This plot is based on the following notebook by Maren Büttner.

fit(keys=None, compute_absorption_probabilities=True, **kwargs)[source]

Run the pipeline.

Parameters
Returns

Nothing, just makes available the following fields:

• terminal_states_probabilities

• terminal_states

• absorption_probabilities

• priming_degree

Return type

None

copy()[source]

Return a copy of self, including the underlying adata object.

Return type

BaseEstimator

write(fname, ext='pickle')[source]

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

### Kernel¶

class cellrank.tl.kernels.Kernel(adata, backward=False, compute_cond_num=False, check_connectivity=False, **kwargs)[source]

A base class from which all kernels are derived.

These kernels read from a given AnnData object, usually the KNN graph and additional variables, to compute a weighted, directed graph. Every kernel object has a direction. The kernels defined in the derived classes are not strictly kernels in the mathematical sense because they often only take one input argument - however, they build on other functions which have computed a similarity based on two input arguments. The role of the kernels defined here is to add directionality to these symmetric similarity relations or to transform them.

Parameters

Annotated data object.

Returns

Annotated data object.

Return type

anndata.AnnData

property backward: bool

Direction of the process.

Return type

bool

Compute a projection of the transition matrix in the embedding.

Projections can only be calculated for kNN based kernels. The projected matrix can be then visualized as:

scvelo.pl.velocity_embedding(adata, vkey='T_fwd', basis='umap')

Parameters
Return type
Returns

• If copy=True, the projection array of shape (n_cells, n_components).

• Otherwise, it modifies anndata.AnnData.obsm with a key based on key_added.

abstract compute_transition_matrix(*args, **kwargs)

Compute a transition matrix.

Parameters
Returns

Self.

Return type

cellrank.tl.kernels.KernelExpression

property condition_number: Optional[int]

Condition number of the transition matrix.

Return type
abstract copy()

Return a copy of itself. Note that the underlying adata object is not copied.

Return type

KernelExpression

property kernels: List[cellrank.tl.kernels._base_kernel.Kernel]

Get the kernels of the kernel expression, except for constants.

Return type
property params: Dict[str, Any]

Parameters which are used to compute the transition matrix.

Return type
plot_random_walks(n_sims, max_iter=0.25, seed=None, successive_hits=0, start_ixs=None, stop_ixs=None, basis='umap', cmap='gnuplot', linewidth=1.0, linealpha=0.3, ixs_legend_loc=None, n_jobs=None, backend='loky', show_progress_bar=True, figsize=None, dpi=None, save=None, **kwargs)

Plot random walks in an embedding.

This method simulates random walks on the Markov chain defined though the corresponding transition matrix. The method is intended to give qualitative rather than quantitative insights into the transition matrix. Random walks are simulated by iteratively choosing the next cell based on the current cell’s transition probabilities.

Parameters
Return type

None

Returns

• None – Nothing, just plots the figure. Optionally saves it based on save.

• For each random walk, the first/last cell is marked by the start/end colors of cmap.

plot_single_flow(cluster, cluster_key, time_key, clusters=None, time_points=None, min_flow=0, remove_empty_clusters=True, ascending=False, legend_loc='upper right out', alpha=0.8, xticks_step_size=1, figsize=None, dpi=None, save=None, show=True)

Visualize outgoing flow from a cluster of cells .

Parameters
Return type

Optional[Axes]

Returns

• matplotlib.pyplot.Axes – The axis object if show=False.

• None – Nothing, just plots the figure. Optionally saves it based on save.

Notes

This function is a Python reimplementation of the following original R function with some minor stylistic differences. This function will not recreate the results from , because there the Metacell model was used to compute the flow, whereas here the transition matrix is used.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]

Return row-normalized transition matrix.

If not present, it is computed iff all underlying kernels have been initialized.

Return type

Union[ndarray, spmatrix]

write(fname, ext='pickle')

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

Write the transition matrix and parameters used for computation to the underlying adata object.

Parameters

key (Optional[str]) – Key used when writing transition matrix to adata. If None, the key is set to ‘T_bwd’ if backward is True, else ‘T_fwd’.

Returns

Updates the adata with the following fields:

• .obsp['{key}'] - the transition matrix.

• .uns['{key}_params'] - parameters used for calculation.

Return type

None

### ExperimentalTime Kernel¶

Kernel base class which computes directed transition probabilities based on experimental time.

Optionally, we apply a density correction as described in , where we use the implementation of .

Parameters
• adata (anndata.AnnData) – Annotated data object.

• backward (bool) – Direction of the process.

• time_key (str) – Key in adata .obs where experimental time is stored. The experimental time can be of either of a numeric or an ordered categorical type.

• compute_cond_num (bool) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.

plot_single_flow(cluster, cluster_key, time_key=None, *args, **kwargs)[source]

Visualize outgoing flow from a cluster of cells .

Parameters
• cluster (str) – Cluster for which to visualize outgoing compute_flow.

• cluster_key (str) – Key in adata .obs where clustering is stored.

• time_key (Optional[str]) – Key in adata .obs where experimental time is stored.

• clusters – Visualize flow only for these clusters. If None, use all clusters.

• time_points – Visualize flow only for these time points. If None, use all time points.

• min_flow – Only show flow edges with flow greater than this value. Flow values are always in [0, 1].

• remove_empty_clusters – Whether to remove clusters with no incoming flow edges.

• ascending – Whether to sort the cluster by ascending or descending incoming flow. If None, use the order as in defined by clusters.

• alpha – Alpha value for cell proportions.

• xticks_step_size – Show only every n-th ticks on x-axis. If None, don’t show any ticks.

• legend_loc – Position of the legend. If None, do not show the legend.

• figsize – Size of the figure.

• dpi – Dots per inch.

• save – Filename where to save the plot.

• show – If False, return matplotlib.pyplot.Axes.

Return type

None

Returns

• matplotlib.pyplot.Axes – The axis object if show=False.

• None – Nothing, just plots the figure. Optionally saves it based on save.

property experimental_time: pandas.core.series.Series

Experimental time.

Return type

Series

copy()[source]

Return a copy of self.

Return type

ExperimentalTimeKernel

Annotated data object.

Returns

Annotated data object.

Return type

anndata.AnnData

property backward: bool

Direction of the process.

Return type

bool

Compute a projection of the transition matrix in the embedding.

Projections can only be calculated for kNN based kernels. The projected matrix can be then visualized as:

scvelo.pl.velocity_embedding(adata, vkey='T_fwd', basis='umap')

Parameters
Return type
Returns

• If copy=True, the projection array of shape (n_cells, n_components).

• Otherwise, it modifies anndata.AnnData.obsm with a key based on key_added.

abstract compute_transition_matrix(*args, **kwargs)

Compute a transition matrix.

Parameters
Returns

Self.

Return type

cellrank.tl.kernels.KernelExpression

property condition_number: Optional[int]

Condition number of the transition matrix.

Return type
property kernels: List[cellrank.tl.kernels._base_kernel.Kernel]

Get the kernels of the kernel expression, except for constants.

Return type
property params: Dict[str, Any]

Parameters which are used to compute the transition matrix.

Return type
plot_random_walks(n_sims, max_iter=0.25, seed=None, successive_hits=0, start_ixs=None, stop_ixs=None, basis='umap', cmap='gnuplot', linewidth=1.0, linealpha=0.3, ixs_legend_loc=None, n_jobs=None, backend='loky', show_progress_bar=True, figsize=None, dpi=None, save=None, **kwargs)

Plot random walks in an embedding.

This method simulates random walks on the Markov chain defined though the corresponding transition matrix. The method is intended to give qualitative rather than quantitative insights into the transition matrix. Random walks are simulated by iteratively choosing the next cell based on the current cell’s transition probabilities.

Parameters
Return type

None

Returns

• None – Nothing, just plots the figure. Optionally saves it based on save.

• For each random walk, the first/last cell is marked by the start/end colors of cmap.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]

Return row-normalized transition matrix.

If not present, it is computed iff all underlying kernels have been initialized.

Return type

Union[ndarray, spmatrix]

write(fname, ext='pickle')

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

Write the transition matrix and parameters used for computation to the underlying adata object.

Parameters

key (Optional[str]) – Key used when writing transition matrix to adata. If None, the key is set to ‘T_bwd’ if backward is True, else ‘T_fwd’.

Returns

Updates the adata with the following fields:

• .obsp['{key}'] - the transition matrix.

• .uns['{key}_params'] - parameters used for calculation.

Return type

None

### TransportMap Kernel¶

class cellrank.tl.kernels.TransportMapKernel(*args, **kwargs)[source]

Kernel base class which computes transition matrix based on transport maps for consecutive time pairs.

property transport_maps: Optional[Dict[Tuple[float, float], anndata._core.anndata.AnnData]]

Transport maps for consecutive time pairs.

Return type

Annotated data object.

Returns

Annotated data object.

Return type

anndata.AnnData

property backward: bool

Direction of the process.

Return type

bool

Compute a projection of the transition matrix in the embedding.

Projections can only be calculated for kNN based kernels. The projected matrix can be then visualized as:

scvelo.pl.velocity_embedding(adata, vkey='T_fwd', basis='umap')

Parameters
Return type
Returns

• If copy=True, the projection array of shape (n_cells, n_components).

• Otherwise, it modifies anndata.AnnData.obsm with a key based on key_added.

abstract compute_transition_matrix(*args, **kwargs)

Compute a transition matrix.

Parameters
Returns

Self.

Return type

cellrank.tl.kernels.KernelExpression

property condition_number: Optional[int]

Condition number of the transition matrix.

Return type
copy()

Return a copy of self.

Return type

ExperimentalTimeKernel

property experimental_time: pandas.core.series.Series

Experimental time.

Return type

Series

property kernels: List[cellrank.tl.kernels._base_kernel.Kernel]

Get the kernels of the kernel expression, except for constants.

Return type
property params: Dict[str, Any]

Parameters which are used to compute the transition matrix.

Return type
plot_random_walks(n_sims, max_iter=0.25, seed=None, successive_hits=0, start_ixs=None, stop_ixs=None, basis='umap', cmap='gnuplot', linewidth=1.0, linealpha=0.3, ixs_legend_loc=None, n_jobs=None, backend='loky', show_progress_bar=True, figsize=None, dpi=None, save=None, **kwargs)

Plot random walks in an embedding.

This method simulates random walks on the Markov chain defined though the corresponding transition matrix. The method is intended to give qualitative rather than quantitative insights into the transition matrix. Random walks are simulated by iteratively choosing the next cell based on the current cell’s transition probabilities.

Parameters
Return type

None

Returns

• None – Nothing, just plots the figure. Optionally saves it based on save.

• For each random walk, the first/last cell is marked by the start/end colors of cmap.

plot_single_flow(cluster, cluster_key, time_key=None, *args, **kwargs)

Visualize outgoing flow from a cluster of cells .

Parameters
• cluster (str) – Cluster for which to visualize outgoing compute_flow.

• cluster_key (str) – Key in adata .obs where clustering is stored.

• time_key (Optional[str]) – Key in adata .obs where experimental time is stored.

• clusters – Visualize flow only for these clusters. If None, use all clusters.

• time_points – Visualize flow only for these time points. If None, use all time points.

• min_flow – Only show flow edges with flow greater than this value. Flow values are always in [0, 1].

• remove_empty_clusters – Whether to remove clusters with no incoming flow edges.

• ascending – Whether to sort the cluster by ascending or descending incoming flow. If None, use the order as in defined by clusters.

• alpha – Alpha value for cell proportions.

• xticks_step_size – Show only every n-th ticks on x-axis. If None, don’t show any ticks.

• legend_loc – Position of the legend. If None, do not show the legend.

• figsize – Size of the figure.

• dpi – Dots per inch.

• save – Filename where to save the plot.

• show – If False, return matplotlib.pyplot.Axes.

Return type

None

Returns

• matplotlib.pyplot.Axes – The axis object if show=False.

• None – Nothing, just plots the figure. Optionally saves it based on save.

Deserialize self from a file.

Parameters

fname (Union[str, Path]) – Filename from which to read the object.

Returns

The deserialized object.

Return type

typing.Any

property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]

Return row-normalized transition matrix.

If not present, it is computed iff all underlying kernels have been initialized.

Return type

Union[ndarray, spmatrix]

write(fname, ext='pickle')

Serialize self to a file.

Parameters
Returns

Nothing, just writes itself to a file using pickle.

Return type

None

Write the transition matrix and parameters used for computation to the underlying adata object.

Parameters

key (Optional[str]) – Key used when writing transition matrix to adata. If None, the key is set to ‘T_bwd’ if backward is True, else ‘T_fwd’.

Returns

Updates the adata with the following fields:

• .obsp['{key}'] - the transition matrix.

• .uns['{key}_params'] - parameters used for calculation.

Return type

None

### Similarity Scheme¶

class cellrank.tl.kernels.SimilaritySchemeABC[source]

Base class for all similarity schemes.

abstract __call__(v, D, softmax_scale=1.0)[source]

Compute transition probability of a cell to its nearest neighbors using RNA velocity.

Parameters
• v (ndarray) – Array of shape (n_genes,) or (n_neighbors, n_genes) containing the velocity vector(s). The second case is used for the backward process.

• D (ndarray) – Array of shape (n_neighbors, n_genes) corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.

• softmax_scale (float) – Scaling factor for the softmax function.

Returns

The probability and logits arrays of shape (n_neighbors,).

Return type

### Threshold Scheme¶

class cellrank.tl.kernels.ThresholdSchemeABC[source]

Base class for all connectivity biasing schemes.

abstract __call__(cell_pseudotime, neigh_pseudotime, neigh_conn, **kwargs)[source]

Calculate biased connections for a given cell.

Parameters
• cell_pseudotime (float) – Pseudotime of the current cell.

• neigh_pseudotime (ndarray) – Array of shape (n_neighbors,) containing pseudotimes of neighbors.

• neigh_conn (ndarray) – Array of shape (n_neighbors,) containing connectivities of the current cell and its neighbors.

Returns

Return type

Array of shape (n_neighbors,) containing the biased connectivities.

bias_knn(conn, pseudotime, n_jobs=None, backend='loky', show_progress_bar=True, **kwargs)[source]

Bias cell-cell connectivities of a KNN graph.

Parameters
Returns

Return type

The biased connectivities.

### BaseModel¶

Base class for all model classes.

Parameters
property prepared

Whether the model is prepared for fitting.

Annotated data object.

Returns

Return type

anndata.AnnData

property model: Any

The underlying model.

Return type

Any

property x_all: numpy.ndarray

Unfiltered independent variables of shape (n_cells, 1).

Return type

ndarray

property y_all: numpy.ndarray

Unfiltered dependent variables of shape (n_cells, 1).

Return type

ndarray

property w_all: numpy.ndarray

Unfiltered weights of shape (n_cells,).

Return type

ndarray

property x: numpy.ndarray

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property y: numpy.ndarray

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

Return type

ndarray

property w: numpy.ndarray

Filtered weights of shape (n_filtered_cells,) used for fitting.

Return type

ndarray

property x_test: numpy.ndarray

Independent variables of shape (n_samples, 1) used for prediction.

Return type

ndarray

property y_test: numpy.ndarray

Prediction values of shape (n_samples,) for x_test.

Return type

ndarray

property x_hat: numpy.ndarray

Filtered independent variables used when calculating default confidence interval, usually same as x.

Return type

ndarray

property y_hat: numpy.ndarray

Filtered dependent variables used when calculating default confidence interval, usually same as y.

Return type

ndarray

property conf_int: numpy.ndarray

Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

ndarray

prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)[source]

Prepare the model to be ready for fitting.

Parameters
Returns

Nothing, but updates the following fields:

Return type

None

abstract fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters
Returns

Fits the model and returns self.

Return type

cellrank.ul.models.BaseModel

Run the prediction.

Parameters
Returns

Return type

numpy.ndarray

abstract confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval.

Use default_confidence_interval() function if underlying model has not method for confidence interval calculation.

Parameters
Returns

• conf_int - Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.

Return type

numpy.ndarray

default_confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from , eq. 5.

Parameters
Returns

Return type

numpy.ndarray

plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)[source]

Plot the smoothed gene expression.

Parameters
Returns

Nothing, just plots the figure. Optionally saves it based on save.

Return type

None

abstract copy()[source]

Return a copy of self.

Return type

BaseModel

### Lineage¶

class cellrank.tl.Lineage(input_array: numpy.ndarray, *, names: Iterable[str], colors: Optional[Iterable[cellrank.tl._lineage.ColorLike]] = None)[source]

Lightweight numpy.ndarray wrapper that adds names and colors.

Parameters
• input_array – Input array containing lineage probabilities, each lineage being stored in a column.

• names – Names of the lineages.

• colors – Colors of the lineages.

property names: numpy.ndarray

Lineage names. Must be unique.

Return type

ndarray

property colors: numpy.ndarray

Lineage colors.

Return type

ndarray

property X: numpy.ndarray

Convert self to numpy array, losing names and colors.

Return type

ndarray

property T

Transpose of self.

view(dtype=None, type=None)[source]

Return a view of self.

Return type

LineageView

priming_degree(method='kl_divergence', early_cells=None)[source]

Compute the degree of lineage priming.

This method computes how naive vs. committed each individual cell is. It returns a score where 0 stands for naive and 1 stands for committed.

Parameters
• method (Literal[‘kl_divergence’, ‘entropy’]) –

The method used to compute the degree of lineage priming. Valid options are:

• ’kl_divergence’: as in , computes KL-divergence between the fate probabilities of a cell and the average fate probabilities. Computation of average fate probabilities can be restricted to a set of user-defined early_cells.

• ’entropy’: as in , computes entropy over a cell’s fate probabilities.

• early_cells (Optional[ndarray]) – Cell ids or a mask marking early cells. If None, use all cells. Only used when method='kl_divergence'.

Returns

Return type

The priming degree.

plot_pie(reduction, title=None, legend_loc='on data', legend_kwargs=mappingproxy({}), figsize=None, dpi=None, save=None, **kwargs)[source]

Plot a pie chart visualizing aggregated lineage probabilities.

Parameters
• reduction (