Classes
Estimators
GPCCA
- class cellrank.tl.estimators.GPCCA(obj, obsp_key=None, **kwargs)[source]
Generalized Perron Cluster Cluster Analysis [Reuter et al., 2018] as implemented in pyGPCCA.
Coarse-grains a discrete Markov chain into a set of macrostates and computes coarse-grained transition probabilities among the macrostates. Each macrostate corresponds to an area of the state space, i.e. to a subset of cells. The assignment is soft, i.e. each cell is assigned to every macrostate with a certain weight, where weights sum to one per cell. Macrostates are computed by maximizing the ‘crispness’ which can be thought of as a measure for minimal overlap between macrostates in a certain inner-product sense. Once the macrostates have been computed, we project the large transition matrix onto a coarse-grained transition matrix among the macrostates via a Galerkin projection. This projection is based on invariant subspaces of the original transition matrix which are obtained using the real Schur decomposition [Reuter et al., 2018].
- Parameters
obj (
Union
[AnnData
,ndarray
,spmatrix
,KernelExpression
]) –Can be one of the following:
cellrank.tl.kernels.Kernel
- kernel object.anndata.AnnData
- annotated data object containing transition matrix inanndata.AnnData.obsp
.numpy.ndarray
- row-normalized sparse transition matrix.scipy.sparse.spmatrix
- row-normalized sparse transition matrix.
obsp_key (
Optional
[str
]) – Key inanndata.AnnData.obsp
where the transition matrix is stored. Only used whenobj
is ananndata.AnnData
object.
- property macrostates: Optional[pandas.core.series.Series]
Macrostates of the transition matrix.
- property macrostates_memberships: Optional[cellrank.tl._lineage.Lineage]
Macrostate membership matrix.
Soft assignment of microstates (cells) to macrostates.
- property terminal_states_memberships: Optional[cellrank.tl._lineage.Lineage]
Terminal state membership matrix.
Soft assignment of cells to terminal states.
- property coarse_initial_distribution: Optional[pandas.core.series.Series]
Coarse-grained initial distribution.
- property coarse_stationary_distribution: Optional[pandas.core.series.Series]
Coarse-grained stationary distribution.
- property coarse_T: Optional[pandas.core.frame.DataFrame]
Coarse-grained transition matrix.
- compute_macrostates(n_states=None, n_cells=30, cluster_key=None, **kwargs)[source]
Compute the macrostates.
- Parameters
n_states (
Union
[int
,Sequence
[int
],None
]) – Number of macrostates. If atyping.Sequence
, use the minChi criterion [Reuter et al., 2018]. If None, use the eigengap heuristic.n_cells (
Optional
[int
]) – Number of most likely cells from each macrostate to select.cluster_key (
Optional
[str
]) – If a key to cluster labels is given, names and colors of the states will be associated with the clusters.kwargs (
Any
) – Keyword arguments forcompute_schur()
.
- Return type
- Returns
Nothing, just updates the following fields:
macrostates
- Macrostates of the transition matrix.macrostates_memberships
- Macrostate membership matrix.coarse_T
- Coarse-grained transition matrix.coarse_initial_distribution
- Coarse-grained initial distribution.coarse_stationary_distribution
- Coarse-grained stationary distribution.schur_vectors
- Real Schur vectors of the transition matrix.schur_matrix
- Schur matrix.eigendecomposition
- Eigendecomposition oftransition_matrix
.
- predict(method=TermStatesMethod.STABILITY, n_cells=30, alpha=1, stability_threshold=0.96, n_states=None)[source]
Automatically select terminal states from macrostates.
- Parameters
method (
Literal
[‘stability’, ‘top_n’, ‘eigengap’, ‘eigengap_coarse’]) –How to select the terminal states. Valid option are:
’eigengap’ - select the number of states based on the eigengap of
transition_matrix
.’eigengap_coarse’ - select the number of states based on the eigengap of the diagonal of
coarse_T
.’top_n’ - select top
n_states
based on the probability of the diagonal ofcoarse_T
.’stability’ - select states which have a stability >=
stability_threshold
. The stability is given by the diagonal elements ofcoarse_T
.
n_cells (
int
) – Number of most likely cells from each macrostate to select.alpha (
Optional
[float
]) – Weight given to the deviation of an eigenvalue from one. Only used whenmethod = 'eigengap'
ormethod = 'eigengap_coarse'
.stability_threshold (
float
) – Threshold used whenmethod = 'stability'
.n_states (
Optional
[int
]) – Number of states used whenmethod = 'top_n'
.
- Return type
- Returns
Nothing, just updates the following fields:
terminal_states
- Categorical annotation of terminal states.terminal_states_memberships
- Terminal state membership matrix.terminal_states_probabilities
- Aggregated probability of cells to be in terminal states.
- set_terminal_states_from_macrostates(names=None, n_cells=30, **kwargs)[source]
Manually select terminal states from macrostates.
- Parameters
names (
Union
[str
,Sequence
[str
],Mapping
[str
,str
],None
]) – Names of the macrostates to be marked as terminal. Multiple states can be combined using ‘,’, such as["Alpha, Beta", "Epsilon"]
. If adict
, keys correspond to the names of the macrostates and the values to the new names. If None, select all macrostates.n_cells (
int
) – Number of most likely cells from each macrostate to select.
- Return type
- Returns
Nothing, just updates the following fields:
terminal_states
- Categorical annotation of terminal states.terminal_states_probabilities
- Aggregated probability of cells to be in terminal states.terminal_states_probabilities_memberships
- Terminal state membership matrix.
- rename_terminal_states(new_names)[source]
Rename categories in
terminal_states
.- Parameters
new_names (
Mapping
[str
,str
]) – Mapping where keys corresponds to the old names and the values to the new names. The new names must be unique.- Return type
- Returns
Nothing, just updates the names of:
terminal_states
- Categorical annotation of terminal states.terminal_states_memberships
- Terminal state membership matrix.
- fit(n_states=None, n_cells=30, cluster_key=None, **kwargs)[source]
Prepare self for terminal states prediction.
- Parameters
n_states (
Union
[int
,Sequence
[int
],None
]) – Number of macrostates. If atyping.Sequence
, use the minChi criterion [Reuter et al., 2018]. If None, use the eigengap heuristic.n_cells (
Optional
[int
]) – Number of most likely cells from each macrostate to select.cluster_key (
Optional
[str
]) – If a key to cluster labels is given, names and colors of the states will be associated with the clusters.kwargs (
Any
) – Keyword arguments forcompute_schur()
.
- Return type
- Returns
Nothing, just updates the following fields:
macrostates
- Macrostates of the transition matrix.macrostates_memberships
- Macrostate membership matrix.coarse_T
- Coarse-grained transition matrix.coarse_initial_distribution
- Coarse-grained initial distribution.coarse_stationary_distribution
- Coarse-grained stationary distribution.schur_vectors
- Real Schur vectors of the transition matrix.schur_matrix
- Schur matrix.eigendecomposition
- Eigendecomposition oftransition_matrix
.
- plot_coarse_T(show_stationary_dist=True, show_initial_dist=False, cmap='viridis', xtick_rotation=45, annotate=True, show_cbar=True, title=None, figsize=(8, 8), dpi=80, save=None, text_kwargs=mappingproxy({}), **kwargs)[source]
Plot the coarse-grained transition matrix between macrostates.
- Parameters
show_stationary_dist (
bool
) – Whether to showcoarse_stationary_distribution
, if present.show_initial_dist (
bool
) – Whether to showcoarse_initial_distribution
.cmap (
Union
[str
,ListedColormap
]) – Colormap to use.xtick_rotation (
float
) – Rotation of ticks on the x-axis.annotate (
bool
) – Whether to display the text on each cell.show_cbar (
bool
) – Whether to show colorbar.dpi (
int
) – Dots per inch.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.text_kwargs (
Mapping
[str
,Any
]) – Keyword arguments formatplotlib.pyplot.text()
.kwargs (
Any
) – Keyword arguments formatplotlib.pyplot.imshow()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- plot_macrostate_composition(key, width=0.8, title=None, labelrot=45, legend_loc='upper right out', figsize=None, dpi=None, save=None, show=True)[source]
Plot stacked histogram of macrostates over categorical annotations.
- Parameters
adata (
anndata.AnnData
) – Annotated data object.key (
str
) – Key fromanndata.AnnData.obs
containing categorical annotations.width (
float
) – Bar width in [0, 1].title (
Optional
[str
]) – Title of the figure. If None, create one automatically.labelrot (
float
) – Rotation of labels on x-axis.legend_loc (
Optional
[str
]) – Position of the legend. If None, don’t show legend.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.show (
bool
) – If False, returnmatplotlib.pyplot.Axes
.
- Return type
- Returns
The axes object, if
show = False
. Nothing, just plots the figure. Optionally saves it based onsave
.
- plot_macrostates(states=None, color=None, discrete=False, mode=PlotMode.EMBEDDING, time_key='latent_time', same_plot=True, title=None, cmap='viridis', **kwargs)
Plot continuous or categorical observations in an embedding or along pseudotime.
- Parameters
color (
Optional
[str
]) – Key inanndata.AnnData.obs
.discrete (
bool
) – Whether to plot the data as continuous or discrete observations. If the data cannot be plotted as continuous observations, it will be plotted as discrete.mode (
Literal
[‘embedding’, ‘time’]) –Valid options are:
’embedding’ - plot the embedding while coloring in continuous or categorical observations.
’time’ - plot the pseudotime on x-axis and the probabilities/memberships on y-axis.
time_key (
str
) – Key inanndata.AnnData.obs
where pseudotime is stored. Only used whenmode = 'time'
.title (
Union
[str
,Sequence
[str
],None
]) – Title of the plot(s).same_plot (
bool
) – Whether to plot the data on the same plot or not. Only use whenmode = 'embedding'
. If True anddiscrete = False
,color
is ignored.cmap (
str
) – Colormap for continuous data.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- plot_terminal_states(states=None, color=None, discrete=False, mode=PlotMode.EMBEDDING, time_key='latent_time', same_plot=True, title=None, cmap='viridis', **kwargs)
Plot continuous or categorical observations in an embedding or along pseudotime.
- Parameters
color (
Optional
[str
]) – Key inanndata.AnnData.obs
.discrete (
bool
) – Whether to plot the data as continuous or discrete observations. If the data cannot be plotted as continuous observations, it will be plotted as discrete.mode (
Literal
[‘embedding’, ‘time’]) –Valid options are:
’embedding’ - plot the embedding while coloring in continuous or categorical observations.
’time’ - plot the pseudotime on x-axis and the probabilities/memberships on y-axis.
time_key (
str
) – Key inanndata.AnnData.obs
where pseudotime is stored. Only used whenmode = 'time'
.title (
Union
[str
,Sequence
[str
],None
]) – Title of the plot(s).same_plot (
bool
) – Whether to plot the data on the same plot or not. Only use whenmode = 'embedding'
. If True anddiscrete = False
,color
is ignored.cmap (
str
) – Colormap for continuous data.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- property absorption_probabilities: Optional[cellrank.tl._lineage.Lineage]
Absorption probabilities.
Informally, given a (finite, discrete) Markov chain with a set of transient states \(T\) and a set of absorbing states \(A\), the absorption probability for cell \(i\) from \(T\) to reach cell \(j\) from \(R\) is the probability that a random walk initialized in \(i\) will reach absorbing state \(j\).
In our context, states correspond to cells, in particular, absorbing states correspond to cells in terminal states.
- property absorption_times: Optional[pandas.core.frame.DataFrame]
Mean and variance of the time until absorption.
Related to conditional mean first passage times. Corresponds to the expectation of the time until absorption, depending on initialization, and the variance.
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- compute_absorption_probabilities(keys=None, solver='gmres', use_petsc=True, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-06, preconditioner=None)
Compute absorption probabilities.
For each cell, this computes the probability of being absorbed in any of the
terminal_states
. In particular, this corresponds to the probability that a random walk initialized in transient cell \(i\) will reach any cell from a fixed transient state before reaching a cell from any other transient state.- Parameters
keys (
Optional
[Sequence
[str
]]) – Terminal states for which to compute the absorption probabilities. If None, use all states defined interminal_states
.solver (
Union
[str
,Literal
[‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’, ‘gcrotmk’]]) –Solver to use for the linear problem. Options are ‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’ or ‘gcrotmk’ when
use_petsc = False
or one ofpetsc4py.PETSc.KPS.Type
otherwise.Information on the
scipy
iterative solvers can be found inscipy.sparse.linalg()
or forpetsc4py
solver here.use_petsc (
bool
) – Whether to use solvers frompetsc4py
orscipy
. Recommended for large problems. If no installation is found, defaults toscipy.sparse.linalg.gmres()
.time_to_absorption (
Union
[Literal
[‘all’],Sequence
[Union
[str
,Sequence
[str
]]],Dict
[Union
[str
,Sequence
[str
]],Literal
[‘mean’, ‘var’]],None
]) –Whether to compute mean time to absorption and its variance to specific absorbing states.
If a
dict
, can be specified as{{'Alpha': 'var', ...}}
to also compute variance. In case when states are atuple
, time to absorption will be computed to the subset of these states, such as[('Alpha', 'Beta'), ...]
or{{('Alpha', 'Beta'): 'mean', ...}}
. Can be specified as'all'
to compute it to any absorbing state inkeys
, which is more efficient than listing all absorbing states explicitly.It might be beneficial to disable the progress bar as
show_progress_bar = False
because of many solves.n_jobs (
Optional
[int
]) – Number of parallel jobs to use when using an iterative solver.backend (
Literal
[‘loky’, ‘multiprocessing’, ‘threading’]) – Which backend to use for multiprocessing. Seejoblib.Parallel
for valid options.show_progress_bar (
bool
) – Whether to show progress bar. Only used whensolver != 'direct'
.tol (
float
) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.preconditioner (
Optional
[str
]) – Preconditioner to use, only available whenuse_petsc = True
. For valid options, see here. We recommend the ‘ilu’ preconditioner for badly conditioned problems.
- Return type
- Returns
Nothing, just updates the following fields:
absorption_probabilities
- Absorption probabilities.absorption_times
- Mean and variance of the time until absorption. Only iftime_to_absorption
is specified.
- compute_eigendecomposition(k=20, which='LR', alpha=1.0, only_evals=False, ncv=None)
Compute eigendecomposition of
transition_matrix
.Uses a sparse implementation, if possible, and only computes the top \(k\) eigenvectors to speed up the computation. Computes both left and right eigenvectors.
- Parameters
k (
int
) – Number of eigenvectors or eigenvalues to compute.which (
Literal
[‘LR’, ‘LM’]) –How to sort the eigenvalues. Valid option are:
’LR’ - the largest real part.
’LM’ - the largest magnitude.
alpha (
float
) – Used to compute the eigengap.alpha
is the weight given to the deviation of an eigenvalue from one.only_evals (
bool
) – Whether to compute only eigenvalues.
- Return type
- Returns
Nothing, just updates the following field:
eigendecomposition
- Eigendecomposition oftransition_matrix
.
- compute_lineage_drivers(lineages=None, method=TestMethod.FISCHER, cluster_key=None, clusters=None, layer=None, use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, **kwargs)
Compute driver genes per lineage.
Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.
- Parameters
lineages (
Union
[str
,Sequence
,None
]) – Lineage names fromabsorption_probabilities
. If None, use all lineages.method (
Literal
[‘fischer’, ‘perm_test’]) –Mode to use when calculating p-values and confidence intervals. Valid options are:
’fischer’ - use Fischer transformation [Fisher, 1921].
’perm_test’ - use permutation test.
cluster_key (
Optional
[str
]) – Key fromanndata.AnnData.obs
to obtain cluster annotations. These are considered forclusters
.clusters (
Union
[str
,Sequence
,None
]) – Restrict the correlations to these clusters.layer (
Optional
[str
]) – Key fromanndata.AnnData.layers
from which to get the expression. If None or ‘X’, useanndata.AnnData.X
.use_raw (
bool
) – Whether or not to useanndata.AnnData.raw
to correlate gene expression.confidence_level (
float
) – Confidence level for the confidence interval calculation. Must be in interval [0, 1].n_perms (
int
) – Number of permutations to use whenmethod = 'perm_test'
.seed (
Optional
[int
]) – Random seed whenmethod = 'perm_test'
.show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.
n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.
backend – Which backend to use for parallelization. See
joblib.Parallel
for valid options.
- Return type
- Returns
Dataframe of shape
(n_genes, n_lineages * 5)
containing the following columns, one for each lineage:{lineage}_corr
- correlation between the gene expression and absorption probabilities.{lineage}_pval
- calculated p-values for double-sided test.{lineage}_qval
- corrected p-values using Benjamini-Hochberg method at level 0.05.{lineage}_ci_low
- lower bound of theconfidence_level
correlation confidence interval.{lineage}_ci_high
- upper bound of theconfidence_level
correlation confidence interval.
Also updates the following field:
lineage_drivers
- the samepandas.DataFrame
as described above.
- compute_lineage_priming(method='kl_divergence', early_cells=None)
Compute the degree of lineage priming.
It returns a score in [0, 1] where 0 stands for naive and 1 stands for committed.
- Parameters
method (
Literal
[‘kl_divergence’, ‘entropy’]) –The method used to compute the degree of lineage priming. Valid options are:
’kl_divergence’ - as in [Velten et al., 2017], computes KL-divergence between the fate probabilities of a cell and the average fate probabilities. Computation of average fate probabilities can be restricted to a set of user-defined
early_cells
.’entropy’ - as in [Setty et al., 2019], computes entropy over a cell’s fate probabilities.
early_cells (
Union
[Mapping
[str
,Sequence
[str
]],Sequence
[str
],None
]) – Cell IDs or a mask marking early cells. If None, use all cells. Only used whenmethod = 'kl_divergence'
. If adict
, the key specifies a cluster key inanndata.AnnData.obs
and the values specify cluster labels containing early cells.
- Return type
- Returns
The priming degree.
Also updates the following field:
priming_degree
- Priming degree.
- compute_schur(n_components=10, initial_distribution=None, method='krylov', which='LR', alpha=1.0)
Compute Schur decomposition.
- Parameters
n_components (
int
) – Number of Schur vectors to compute.initial_distribution (
Optional
[ndarray
]) – Input distribution over all cells. If None, uniform distribution is used.method (
Literal
[‘krylov’, ‘brandts’]) –Method for calculating the Schur vectors. Valid options are:
’krylov’ - an iterative procedure that computes a partial, sorted Schur decomposition for large, sparse matrices.
’brandts’ - full sorted Schur decomposition of a dense matrix.
For benefits of each method, see
pygpcca.GPCCA
.which (
Literal
[‘LR’, ‘LM’]) –How to sort the eigenvalues. Valid option are:
’LR’ - the largest real part.
’LM’ - the largest magnitude.
alpha (
float
) – Used to compute the eigengap.alpha
is the weight given to the deviation of an eigenvalue from one.
- Returns
Nothing, just updates the following fields:
schur_vectors
- Real Schur vectors of the transition matrix.schur_matrix
- Schur matrix.eigendecomposition
- Eigendecomposition oftransition_matrix
.
- compute_terminal_states(*args, **kwargs)
Compute terminal states of the process.
This is an alias for
predict()
.- Parameters
- Return type
- Returns
Nothing, just updates the following fields:
terminal_states
- Categorical annotation of terminal states.terminal_states_probabilities
- Aggregated probability of cells to be in terminal states.
- copy(*, deep=False)
Return a copy of self.
- property eigendecomposition: Optional[Dict[str, Any]]
Eigendecomposition of
transition_matrix
.For non-symmetric real matrices, left and right eigenvectors will in general be different and complex. We compute both left and right eigenvectors.
- Return type
- Returns
A dictionary with the following keys:
’D’ - the eigenvalues.
’eigengap’ - the eigengap.
’params’ - parameters used for the computation.
’V_l’ - left eigenvectors (optional).
’V_r’ - right eigenvectors (optional).
’stationary_dist’ - stationary distribution of
transition_matrix
, if present.
- classmethod from_adata(adata, obsp_key)
Deserialize self from
anndata.AnnData
.- Parameters
adata (
anndata.AnnData
) – Annotated data object.obsp_key (
str
) – Key inanndata.AnnData.obsp
where the transition matrix is stored.
- Return type
- Returns
The deserialized object.
- property kernel: cellrank.tl._mixins._kernel.KernelExpression
Underlying kernel expression.
- Return type
~KernelExpression
- property lineage_drivers: Optional[pandas.core.frame.DataFrame]
Potential lineage drivers.
Computes Pearson correlation of each gene with fate probabilities for every terminal state. High Pearson correlation indicates potential lineage drivers. Also computes p-values and confidence intervals.
- Return type
- Returns
Dataframe of shape
(n_genes, n_lineages * 5)
containing the following columns, one for each lineage:{lineage}_corr
- correlation between the gene expression and absorption probabilities.{lineage}_pval
- calculated p-values for double-sided test.{lineage}_qval
- corrected p-values using Benjamini-Hochberg method at level 0.05.{lineage}_ci_low
- lower bound of theconfidence_level
correlation confidence interval.{lineage}_ci_high
- upper bound of theconfidence_level
correlation confidence interval.
- plot_absorption_probabilities(states=None, color=None, discrete=False, mode=PlotMode.EMBEDDING, time_key='latent_time', same_plot=True, title=None, cmap='viridis', **kwargs)
Plot continuous or categorical observations in an embedding or along pseudotime.
- Parameters
color (
Optional
[str
]) – Key inanndata.AnnData.obs
.discrete (
bool
) – Whether to plot the data as continuous or discrete observations. If the data cannot be plotted as continuous observations, it will be plotted as discrete.mode (
Literal
[‘embedding’, ‘time’]) –Valid options are:
’embedding’ - plot the embedding while coloring in continuous or categorical observations.
’time’ - plot the pseudotime on x-axis and the probabilities/memberships on y-axis.
time_key (
str
) – Key inanndata.AnnData.obs
where pseudotime is stored. Only used whenmode = 'time'
.title (
Union
[str
,Sequence
[str
],None
]) – Title of the plot(s).same_plot (
bool
) – Whether to plot the data on the same plot or not. Only use whenmode = 'embedding'
. If True anddiscrete = False
,color
is ignored.cmap (
str
) – Colormap for continuous data.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- plot_lineage_drivers(lineage, n_genes=8, use_raw=False, ascending=False, ncols=None, title_fmt='{gene} qval={qval:.4e}', figsize=None, dpi=None, save=None, **kwargs)
Plot lineage drivers discovered by
compute_lineage_drivers()
.- Parameters
lineage (
str
) – Lineage for which to plot the driver genes.n_genes (
int
) – Top most correlated genes to plot.use_raw (
bool
) – Whether to access inanndata.AnnData.raw
or not.ascending (
bool
) – Whether to sort the genes in ascending order.title_fmt (
str
) – Title format. Can include {gene}, {pval}, {qval} or {corr}, which will be substituted with the actual values.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- plot_lineage_drivers_correlation(lineage_x, lineage_y, color=None, gene_sets=None, gene_sets_colors=None, use_raw=False, cmap='RdYlBu_r', fontsize=12, adjust_text=False, legend_loc='best', figsize=(4, 4), dpi=None, save=None, show=True, **kwargs)
Show scatter plot of gene-correlations between two lineages.
Optionally, you can pass a
dict
of gene names that will be annotated in the plot.- Parameters
lineage_x (
str
) – Name of the lineage on the x-axis.lineage_y (
str
) – Name of the lineage on the y-axis.color (
Optional
[str
]) – Key inanndata.AnnData.var
oranndata.AnnData.varm
, preferring for the former.gene_sets (
Optional
[Dict
[str
,Sequence
[str
]]]) – Gene sets annotations of the form {‘gene_set_name’: [‘gene_1’, ‘gene_2’], …}.gene_sets_colors (
Optional
[Sequence
[str
]]) – List of colors where each entry corresponds to a gene set fromgenes_sets
. If None and keys ingene_sets
correspond to lineage names, use the lineage colors. Otherwise, use default colors.use_raw (
bool
) – Whether to accessanndata.AnnData.raw
or not.cmap (
str
) – Colormap to use.fontsize (
int
) – Size of the text when plottinggene_sets
.adjust_text (
bool
) – Whether to automatically adjust text in order to reduce overlap.legend_loc (
Optional
[str
]) – Position of the legend. If None, don’t show the legend. Only used whengene_sets != None
.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.show (
bool
) – If False, returnmatplotlib.pyplot.Axes
.kwargs (
Any
) – Keyword arguments forscanpy.pl.scatter()
.
- Return type
- Returns
The axes object, if
show = False
. Nothing, just plots the figure. Optionally saves it based onsave
.
Notes
This plot is based on the following notebook by Maren Büttner.
- plot_schur_matrix(title='schur matrix', cmap='viridis', figsize=None, dpi=80, save=None, **kwargs)
Plot the Schur matrix.
- Parameters
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- plot_spectrum(n=None, real_only=False, show_eigengap=True, show_all_xticks=True, legend_loc=None, title=None, marker='.', figsize=(5, 5), dpi=100, save=None, **kwargs)
Plot the top eigenvalues in real or complex plane.
- Parameters
n (
Optional
[int
]) – Number of eigenvalues to show. If None, show all that have been computed.real_only (
bool
) – Whether to plot only the real part of the spectrum.show_eigengap (
bool
) – Whenreal_only = True
, this determines whether to show the inferred eigengap as a dotted line.show_all_xticks (
bool
) – Whenreal_only = True
, this determines whether to show the indices of all eigenvalues on the x-axis.legend_loc (
Optional
[str
]) – Location parameter for the legend.marker (
str
) – Marker symbol used, valid options can be found inmatplotlib.markers
.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.dpi (
int
) – Dots per inch.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.kwargs (
Any
) – Keyword arguments formatplotlib.pyplot.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- property priming_degree: Optional[pandas.core.series.Series]
Priming degree.
Given a cell \(i\) and a set of terminal states, this quantifies how committed vs. naive cell \(i\) is, i.e. its degree of pluripotency. Low values correspond to naive cells (high degree of pluripotency), high values correspond to committed cells (low degree of pluripotency).
- static read(fname, adata=None, copy=False)
Deserialize self from a file.
- Parameters
fname (
Union
[str
,Path
]) – Filename from which to read the object.adata (
Optional
[AnnData
]) –anndata.AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it or not. Ifadata
is a view, it is always copied.
- Return type
IOMixin
- Returns
The deserialized object.
- property schur_matrix: Optional[numpy.ndarray]
Schur matrix.
The real Schur decomposition is a generalization of the Eigendecomposition and can be computed for any real-valued, square matrix \(A\). It is given by \(A = Q R Q^T\), where \(Q\) contains the real Schur vectors and \(R\) is the Schur matrix. \(Q\) is orthogonal and \(R\) is quasi-upper triangular with 1x1 and 2x2 blocks on the diagonal. If PETSc and SLEPc are installed, only the leading Schur vectors are computed.
- property schur_vectors: Optional[numpy.ndarray]
Real Schur vectors of the transition matrix.
The real Schur decomposition is a generalization of the Eigendecomposition and can be computed for any real-valued, square matrix \(A\). It is given by \(A = Q R Q^T\), where \(Q\) contains the real Schur vectors and \(R\) is the Schur matrix. \(Q\) is orthogonal and \(R\) is quasi-upper triangular with 1x1 and 2x2 blocks on the diagonal. If PETSc and SLEPc are installed, only the leading Schur vectors are computed.
- set_terminal_states(labels, cluster_key=None, add_to_existing=False, **kwargs)
Manually define terminal states.
- Parameters
labels (
Union
[Series
,Dict
[str
,Sequence
[Any
]]]) –Defines the terminal states. Valid options are:
categorical
pandas.Series
where each category corresponds to a terminal state. NaN entries denote cells that do not belong to any terminal state, i.e. these are either initial or transient cells.dict
where keys are terminal states and values are lists of cell barcodes corresponding to annotations inadata.AnnData.obs_names
. If only 1 key is provided, values should correspond to terminal state clusters if a categoricalpandas.Series
can be found inanndata.AnnData.obs
.
cluster_key (
Optional
[str
]) – Key inanndata.AnnData.obs
in order to associate names and colors withterminal_states
. Each terminal state will be given the name and color corresponding to the cluster it mostly overlaps with.add_to_existing (
bool
) – Whether the new terminal states should be added to pre-existing ones. Cells already assigned to a terminal state will be re-assigned to the new terminal state if there’s a conflict between old and new annotations. This throws an error if no previous annotations corresponding to terminal states have been found.
- Return type
- Returns
Nothing, just updates the following fields:
terminal_states
- Categorical annotation of terminal states.terminal_states_probabilities
- Aggregated probability of cells to be in terminal states.
- property terminal_states: Optional[pandas.core.series.Series]
Categorical annotation of terminal states.
By default, all cells in transient cells will be labelled as NaN.
- property terminal_states_probabilities: Optional[pandas.core.series.Series]
Aggregated probability of cells to be in terminal states.
- to_adata(keep=('X', 'raw'), *, copy=True)
Serialize self to
anndata.Anndata
.- Parameters
keep (
Union
[Literal
[‘all’],Sequence
[Literal
[‘X’, ‘raw’, ‘layers’, ‘obs’, ‘var’, ‘obsm’, ‘varm’, ‘obsp’, ‘varp’, ‘uns’]]]) –Which attributes to keep from the underlying
adata
. Valid options are:’all’ - keep all attributes specified in the signature.
typing.Sequence
- keep only subset of these attributes.dict
- the keys correspond the attribute names and values to a subset of keys which to keep from this attribute. If the values are specified either as True or ‘all’, everything from this attribute will be kept.
copy (
Union
[bool
,Sequence
[Literal
[‘X’, ‘raw’, ‘layers’, ‘obs’, ‘var’, ‘obsm’, ‘varm’, ‘obsp’, ‘varp’, ‘uns’]]]) – Whether to copy the data. Can be specified on per-attribute basis. Useful for attributes that store arrays. Attributes not specified here will not be copied.
- Return type
- Returns
adata :
anndata.AnnData
Annotated data object.
- property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]
Transition matrix of
kernel
.
- write(fname, write_adata=True, ext='pickle')
Serialize self to a file.
- Parameters
- Return type
- Returns
Nothing, just writes itself to a file using
pickle
.
CFLARE
- class cellrank.tl.estimators.CFLARE(obj, obsp_key=None, **kwargs)[source]
Compute the initial/terminal states of a Markov chain via spectral heuristics.
This estimator uses the left eigenvectors of the transition matrix to filter to a set of recurrent cells and the right eigenvectors to cluster this set of cells into discrete groups.
- Parameters
obj (
Union
[AnnData
,ndarray
,spmatrix
,KernelExpression
]) –Can be one of the following:
cellrank.tl.kernels.Kernel
- kernel object.anndata.AnnData
- annotated data object containing transition matrix inanndata.AnnData.obsp
.numpy.ndarray
- row-normalized sparse transition matrix.scipy.sparse.spmatrix
- row-normalized sparse transition matrix.
obsp_key (
Optional
[str
]) – Key inanndata.AnnData.obsp
where the transition matrix is stored. Only used whenobj
is ananndata.AnnData
object.
- fit(k=20, **kwargs)[source]
Prepare self for terminal states prediction.
- Parameters
k (
int
) – Number of eigenvectors to compute.kwargs (
Any
) – Keyword arguments forcompute_eigendecomposition()
.
- Return type
TermStatesEstimator
- Returns
Self and modifies the following field:
eigendecomposition
- Eigendecomposition oftransition_matrix
.
- predict(use=None, percentile=98, method='leiden', cluster_key=None, n_clusters_kmeans=None, n_neighbors=20, resolution=0.1, n_matches_min=0, n_neighbors_filtering=15, basis=None, n_comps=5, scale=None)[source]
Find approximate recurrent classes of the Markov chain.
Filter to obtain recurrent states in left eigenvectors. Cluster to obtain approximate recurrent classes in right eigenvectors.
- Parameters
use (
Union
[int
,Sequence
[int
],None
]) – Which or how many first eigenvectors to use as features for filtering and clustering. If None, use the eigengap statistic.percentile (
Optional
[int
]) – Threshold used for filtering out cells which are most likely transient states. Cells which are in the lowerpercentile
percent of each eigenvector will be removed from the data matrix.method (
Literal
[‘leiden’, ‘means’]) –Method to be used for clustering. Valid option are:
’kmeans’ -
sklearn.cluster.KMeans
.’leiden’ -
scanpy.tl.leiden()
.
cluster_key (
Optional
[str
]) – Key inanndata.AnnData.obs
in order to associate names and colors withterminal_states
.n_clusters_kmeans (
Optional
[int
]) – If None, this is set touse + 1
.n_neighbors (
int
) – Number of neighbors in a KNN graph. This is the \(K\) parameter for that, the number of neighbors for each cell. Only used whenmethod = 'leiden'
.resolution (
float
) – Resolution parameter forscanpy.tl.leiden()
. Should be chosen relatively small.n_matches_min (
int
) – Filters out cells which don’t have at leastn_matches_min
neighbors from the same category. This filters out some cells which are transient but have been misassigned.n_neighbors_filtering (
int
) – Parameter for filtering cells. Cells are filtered out if they don’t have at leastn_matches_min
neighbors among theirn_neighbors_filtering
nearest cells.basis (
Optional
[str
]) – Key fromanndata.AnnData.obsm
as additional features for clustering. If None, use only the right eigenvectors.n_comps (
int
) – Number of embedding components to be use whenbasis != None
.scale (
Optional
[bool
]) – Scale the values to z-scores. If None, scale the values ifbasis != None
.
- Return type
- Returns
Nothing, just updates the following fields:
terminal_states
- Categorical annotation of terminal states.terminal_states_probabilities
- Aggregated probability of cells to be in terminal states.
- property absorption_probabilities: Optional[cellrank.tl._lineage.Lineage]
Absorption probabilities.
Informally, given a (finite, discrete) Markov chain with a set of transient states \(T\) and a set of absorbing states \(A\), the absorption probability for cell \(i\) from \(T\) to reach cell \(j\) from \(R\) is the probability that a random walk initialized in \(i\) will reach absorbing state \(j\).
In our context, states correspond to cells, in particular, absorbing states correspond to cells in terminal states.
- property absorption_times: Optional[pandas.core.frame.DataFrame]
Mean and variance of the time until absorption.
Related to conditional mean first passage times. Corresponds to the expectation of the time until absorption, depending on initialization, and the variance.
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- compute_absorption_probabilities(keys=None, solver='gmres', use_petsc=True, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-06, preconditioner=None)
Compute absorption probabilities.
For each cell, this computes the probability of being absorbed in any of the
terminal_states
. In particular, this corresponds to the probability that a random walk initialized in transient cell \(i\) will reach any cell from a fixed transient state before reaching a cell from any other transient state.- Parameters
keys (
Optional
[Sequence
[str
]]) – Terminal states for which to compute the absorption probabilities. If None, use all states defined interminal_states
.solver (
Union
[str
,Literal
[‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’, ‘gcrotmk’]]) –Solver to use for the linear problem. Options are ‘direct’, ‘gmres’, ‘lgmres’, ‘bicgstab’ or ‘gcrotmk’ when
use_petsc = False
or one ofpetsc4py.PETSc.KPS.Type
otherwise.Information on the
scipy
iterative solvers can be found inscipy.sparse.linalg()
or forpetsc4py
solver here.use_petsc (
bool
) – Whether to use solvers frompetsc4py
orscipy
. Recommended for large problems. If no installation is found, defaults toscipy.sparse.linalg.gmres()
.time_to_absorption (
Union
[Literal
[‘all’],Sequence
[Union
[str
,Sequence
[str
]]],Dict
[Union
[str
,Sequence
[str
]],Literal
[‘mean’, ‘var’]],None
]) –Whether to compute mean time to absorption and its variance to specific absorbing states.
If a
dict
, can be specified as{{'Alpha': 'var', ...}}
to also compute variance. In case when states are atuple
, time to absorption will be computed to the subset of these states, such as[('Alpha', 'Beta'), ...]
or{{('Alpha', 'Beta'): 'mean', ...}}
. Can be specified as'all'
to compute it to any absorbing state inkeys
, which is more efficient than listing all absorbing states explicitly.It might be beneficial to disable the progress bar as
show_progress_bar = False
because of many solves.n_jobs (
Optional
[int
]) – Number of parallel jobs to use when using an iterative solver.backend (
Literal
[‘loky’, ‘multiprocessing’, ‘threading’]) – Which backend to use for multiprocessing. Seejoblib.Parallel
for valid options.show_progress_bar (
bool
) – Whether to show progress bar. Only used whensolver != 'direct'
.tol (
float
) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.preconditioner (
Optional
[str
]) – Preconditioner to use, only available whenuse_petsc = True
. For valid options, see here. We recommend the ‘ilu’ preconditioner for badly conditioned problems.
- Return type
- Returns
Nothing, just updates the following fields:
absorption_probabilities
- Absorption probabilities.absorption_times
- Mean and variance of the time until absorption. Only iftime_to_absorption
is specified.
- compute_eigendecomposition(k=20, which='LR', alpha=1.0, only_evals=False, ncv=None)
Compute eigendecomposition of
transition_matrix
.Uses a sparse implementation, if possible, and only computes the top \(k\) eigenvectors to speed up the computation. Computes both left and right eigenvectors.
- Parameters
k (
int
) – Number of eigenvectors or eigenvalues to compute.which (
Literal
[‘LR’, ‘LM’]) –How to sort the eigenvalues. Valid option are:
’LR’ - the largest real part.
’LM’ - the largest magnitude.
alpha (
float
) – Used to compute the eigengap.alpha
is the weight given to the deviation of an eigenvalue from one.only_evals (
bool
) – Whether to compute only eigenvalues.
- Return type
- Returns
Nothing, just updates the following field:
eigendecomposition
- Eigendecomposition oftransition_matrix
.
- compute_lineage_drivers(lineages=None, method=TestMethod.FISCHER, cluster_key=None, clusters=None, layer=None, use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, **kwargs)
Compute driver genes per lineage.
Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.
- Parameters
lineages (
Union
[str
,Sequence
,None
]) – Lineage names fromabsorption_probabilities
. If None, use all lineages.method (
Literal
[‘fischer’, ‘perm_test’]) –Mode to use when calculating p-values and confidence intervals. Valid options are:
’fischer’ - use Fischer transformation [Fisher, 1921].
’perm_test’ - use permutation test.
cluster_key (
Optional
[str
]) – Key fromanndata.AnnData.obs
to obtain cluster annotations. These are considered forclusters
.clusters (
Union
[str
,Sequence
,None
]) – Restrict the correlations to these clusters.layer (
Optional
[str
]) – Key fromanndata.AnnData.layers
from which to get the expression. If None or ‘X’, useanndata.AnnData.X
.use_raw (
bool
) – Whether or not to useanndata.AnnData.raw
to correlate gene expression.confidence_level (
float
) – Confidence level for the confidence interval calculation. Must be in interval [0, 1].n_perms (
int
) – Number of permutations to use whenmethod = 'perm_test'
.seed (
Optional
[int
]) – Random seed whenmethod = 'perm_test'
.show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.
n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.
backend – Which backend to use for parallelization. See
joblib.Parallel
for valid options.
- Return type
- Returns
Dataframe of shape
(n_genes, n_lineages * 5)
containing the following columns, one for each lineage:{lineage}_corr
- correlation between the gene expression and absorption probabilities.{lineage}_pval
- calculated p-values for double-sided test.{lineage}_qval
- corrected p-values using Benjamini-Hochberg method at level 0.05.{lineage}_ci_low
- lower bound of theconfidence_level
correlation confidence interval.{lineage}_ci_high
- upper bound of theconfidence_level
correlation confidence interval.
Also updates the following field:
lineage_drivers
- the samepandas.DataFrame
as described above.
- compute_lineage_priming(method='kl_divergence', early_cells=None)
Compute the degree of lineage priming.
It returns a score in [0, 1] where 0 stands for naive and 1 stands for committed.
- Parameters
method (
Literal
[‘kl_divergence’, ‘entropy’]) –The method used to compute the degree of lineage priming. Valid options are:
’kl_divergence’ - as in [Velten et al., 2017], computes KL-divergence between the fate probabilities of a cell and the average fate probabilities. Computation of average fate probabilities can be restricted to a set of user-defined
early_cells
.’entropy’ - as in [Setty et al., 2019], computes entropy over a cell’s fate probabilities.
early_cells (
Union
[Mapping
[str
,Sequence
[str
]],Sequence
[str
],None
]) – Cell IDs or a mask marking early cells. If None, use all cells. Only used whenmethod = 'kl_divergence'
. If adict
, the key specifies a cluster key inanndata.AnnData.obs
and the values specify cluster labels containing early cells.
- Return type
- Returns
The priming degree.
Also updates the following field:
priming_degree
- Priming degree.
- compute_terminal_states(*args, **kwargs)
Compute terminal states of the process.
This is an alias for
predict()
.- Parameters
- Return type
- Returns
Nothing, just updates the following fields:
terminal_states
- Categorical annotation of terminal states.terminal_states_probabilities
- Aggregated probability of cells to be in terminal states.
- copy(*, deep=False)
Return a copy of self.
- property eigendecomposition: Optional[Dict[str, Any]]
Eigendecomposition of
transition_matrix
.For non-symmetric real matrices, left and right eigenvectors will in general be different and complex. We compute both left and right eigenvectors.
- Return type
- Returns
A dictionary with the following keys:
’D’ - the eigenvalues.
’eigengap’ - the eigengap.
’params’ - parameters used for the computation.
’V_l’ - left eigenvectors (optional).
’V_r’ - right eigenvectors (optional).
’stationary_dist’ - stationary distribution of
transition_matrix
, if present.
- classmethod from_adata(adata, obsp_key)
Deserialize self from
anndata.AnnData
.- Parameters
adata (
anndata.AnnData
) – Annotated data object.obsp_key (
str
) – Key inanndata.AnnData.obsp
where the transition matrix is stored.
- Return type
- Returns
The deserialized object.
- property kernel: cellrank.tl._mixins._kernel.KernelExpression
Underlying kernel expression.
- Return type
~KernelExpression
- property lineage_drivers: Optional[pandas.core.frame.DataFrame]
Potential lineage drivers.
Computes Pearson correlation of each gene with fate probabilities for every terminal state. High Pearson correlation indicates potential lineage drivers. Also computes p-values and confidence intervals.
- Return type
- Returns
Dataframe of shape
(n_genes, n_lineages * 5)
containing the following columns, one for each lineage:{lineage}_corr
- correlation between the gene expression and absorption probabilities.{lineage}_pval
- calculated p-values for double-sided test.{lineage}_qval
- corrected p-values using Benjamini-Hochberg method at level 0.05.{lineage}_ci_low
- lower bound of theconfidence_level
correlation confidence interval.{lineage}_ci_high
- upper bound of theconfidence_level
correlation confidence interval.
- plot_absorption_probabilities(states=None, color=None, discrete=False, mode=PlotMode.EMBEDDING, time_key='latent_time', same_plot=True, title=None, cmap='viridis', **kwargs)
Plot continuous or categorical observations in an embedding or along pseudotime.
- Parameters
color (
Optional
[str
]) – Key inanndata.AnnData.obs
.discrete (
bool
) – Whether to plot the data as continuous or discrete observations. If the data cannot be plotted as continuous observations, it will be plotted as discrete.mode (
Literal
[‘embedding’, ‘time’]) –Valid options are:
’embedding’ - plot the embedding while coloring in continuous or categorical observations.
’time’ - plot the pseudotime on x-axis and the probabilities/memberships on y-axis.
time_key (
str
) – Key inanndata.AnnData.obs
where pseudotime is stored. Only used whenmode = 'time'
.title (
Union
[str
,Sequence
[str
],None
]) – Title of the plot(s).same_plot (
bool
) – Whether to plot the data on the same plot or not. Only use whenmode = 'embedding'
. If True anddiscrete = False
,color
is ignored.cmap (
str
) – Colormap for continuous data.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- plot_lineage_drivers(lineage, n_genes=8, use_raw=False, ascending=False, ncols=None, title_fmt='{gene} qval={qval:.4e}', figsize=None, dpi=None, save=None, **kwargs)
Plot lineage drivers discovered by
compute_lineage_drivers()
.- Parameters
lineage (
str
) – Lineage for which to plot the driver genes.n_genes (
int
) – Top most correlated genes to plot.use_raw (
bool
) – Whether to access inanndata.AnnData.raw
or not.ascending (
bool
) – Whether to sort the genes in ascending order.title_fmt (
str
) – Title format. Can include {gene}, {pval}, {qval} or {corr}, which will be substituted with the actual values.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- plot_lineage_drivers_correlation(lineage_x, lineage_y, color=None, gene_sets=None, gene_sets_colors=None, use_raw=False, cmap='RdYlBu_r', fontsize=12, adjust_text=False, legend_loc='best', figsize=(4, 4), dpi=None, save=None, show=True, **kwargs)
Show scatter plot of gene-correlations between two lineages.
Optionally, you can pass a
dict
of gene names that will be annotated in the plot.- Parameters
lineage_x (
str
) – Name of the lineage on the x-axis.lineage_y (
str
) – Name of the lineage on the y-axis.color (
Optional
[str
]) – Key inanndata.AnnData.var
oranndata.AnnData.varm
, preferring for the former.gene_sets (
Optional
[Dict
[str
,Sequence
[str
]]]) – Gene sets annotations of the form {‘gene_set_name’: [‘gene_1’, ‘gene_2’], …}.gene_sets_colors (
Optional
[Sequence
[str
]]) – List of colors where each entry corresponds to a gene set fromgenes_sets
. If None and keys ingene_sets
correspond to lineage names, use the lineage colors. Otherwise, use default colors.use_raw (
bool
) – Whether to accessanndata.AnnData.raw
or not.cmap (
str
) – Colormap to use.fontsize (
int
) – Size of the text when plottinggene_sets
.adjust_text (
bool
) – Whether to automatically adjust text in order to reduce overlap.legend_loc (
Optional
[str
]) – Position of the legend. If None, don’t show the legend. Only used whengene_sets != None
.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.show (
bool
) – If False, returnmatplotlib.pyplot.Axes
.kwargs (
Any
) – Keyword arguments forscanpy.pl.scatter()
.
- Return type
- Returns
The axes object, if
show = False
. Nothing, just plots the figure. Optionally saves it based onsave
.
Notes
This plot is based on the following notebook by Maren Büttner.
- plot_spectrum(n=None, real_only=False, show_eigengap=True, show_all_xticks=True, legend_loc=None, title=None, marker='.', figsize=(5, 5), dpi=100, save=None, **kwargs)
Plot the top eigenvalues in real or complex plane.
- Parameters
n (
Optional
[int
]) – Number of eigenvalues to show. If None, show all that have been computed.real_only (
bool
) – Whether to plot only the real part of the spectrum.show_eigengap (
bool
) – Whenreal_only = True
, this determines whether to show the inferred eigengap as a dotted line.show_all_xticks (
bool
) – Whenreal_only = True
, this determines whether to show the indices of all eigenvalues on the x-axis.legend_loc (
Optional
[str
]) – Location parameter for the legend.marker (
str
) – Marker symbol used, valid options can be found inmatplotlib.markers
.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.dpi (
int
) – Dots per inch.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.kwargs (
Any
) – Keyword arguments formatplotlib.pyplot.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- plot_terminal_states(states=None, color=None, discrete=False, mode=PlotMode.EMBEDDING, time_key='latent_time', same_plot=True, title=None, cmap='viridis', **kwargs)
Plot continuous or categorical observations in an embedding or along pseudotime.
- Parameters
color (
Optional
[str
]) – Key inanndata.AnnData.obs
.discrete (
bool
) – Whether to plot the data as continuous or discrete observations. If the data cannot be plotted as continuous observations, it will be plotted as discrete.mode (
Literal
[‘embedding’, ‘time’]) –Valid options are:
’embedding’ - plot the embedding while coloring in continuous or categorical observations.
’time’ - plot the pseudotime on x-axis and the probabilities/memberships on y-axis.
time_key (
str
) – Key inanndata.AnnData.obs
where pseudotime is stored. Only used whenmode = 'time'
.title (
Union
[str
,Sequence
[str
],None
]) – Title of the plot(s).same_plot (
bool
) – Whether to plot the data on the same plot or not. Only use whenmode = 'embedding'
. If True anddiscrete = False
,color
is ignored.cmap (
str
) – Colormap for continuous data.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- property priming_degree: Optional[pandas.core.series.Series]
Priming degree.
Given a cell \(i\) and a set of terminal states, this quantifies how committed vs. naive cell \(i\) is, i.e. its degree of pluripotency. Low values correspond to naive cells (high degree of pluripotency), high values correspond to committed cells (low degree of pluripotency).
- static read(fname, adata=None, copy=False)
Deserialize self from a file.
- Parameters
fname (
Union
[str
,Path
]) – Filename from which to read the object.adata (
Optional
[AnnData
]) –anndata.AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it or not. Ifadata
is a view, it is always copied.
- Return type
IOMixin
- Returns
The deserialized object.
- rename_terminal_states(new_names)
Rename categories in
terminal_states
.- Parameters
new_names (
Mapping
[str
,str
]) – Mapping where keys corresponds to the old names and the values to the new names. The new names must be unique.- Return type
- Returns
Nothing, just updates the names of:
terminal_states
- Categorical annotation of terminal states.
- set_terminal_states(labels, cluster_key=None, add_to_existing=False, **kwargs)
Manually define terminal states.
- Parameters
labels (
Union
[Series
,Dict
[str
,Sequence
[Any
]]]) –Defines the terminal states. Valid options are:
categorical
pandas.Series
where each category corresponds to a terminal state. NaN entries denote cells that do not belong to any terminal state, i.e. these are either initial or transient cells.dict
where keys are terminal states and values are lists of cell barcodes corresponding to annotations inadata.AnnData.obs_names
. If only 1 key is provided, values should correspond to terminal state clusters if a categoricalpandas.Series
can be found inanndata.AnnData.obs
.
cluster_key (
Optional
[str
]) – Key inanndata.AnnData.obs
in order to associate names and colors withterminal_states
. Each terminal state will be given the name and color corresponding to the cluster it mostly overlaps with.add_to_existing (
bool
) – Whether the new terminal states should be added to pre-existing ones. Cells already assigned to a terminal state will be re-assigned to the new terminal state if there’s a conflict between old and new annotations. This throws an error if no previous annotations corresponding to terminal states have been found.
- Return type
- Returns
Nothing, just updates the following fields:
terminal_states
- Categorical annotation of terminal states.terminal_states_probabilities
- Aggregated probability of cells to be in terminal states.
- property terminal_states: Optional[pandas.core.series.Series]
Categorical annotation of terminal states.
By default, all cells in transient cells will be labelled as NaN.
- property terminal_states_probabilities: Optional[pandas.core.series.Series]
Aggregated probability of cells to be in terminal states.
- to_adata(keep=('X', 'raw'), *, copy=True)
Serialize self to
anndata.Anndata
.- Parameters
keep (
Union
[Literal
[‘all’],Sequence
[Literal
[‘X’, ‘raw’, ‘layers’, ‘obs’, ‘var’, ‘obsm’, ‘varm’, ‘obsp’, ‘varp’, ‘uns’]]]) –Which attributes to keep from the underlying
adata
. Valid options are:’all’ - keep all attributes specified in the signature.
typing.Sequence
- keep only subset of these attributes.dict
- the keys correspond the attribute names and values to a subset of keys which to keep from this attribute. If the values are specified either as True or ‘all’, everything from this attribute will be kept.
copy (
Union
[bool
,Sequence
[Literal
[‘X’, ‘raw’, ‘layers’, ‘obs’, ‘var’, ‘obsm’, ‘varm’, ‘obsp’, ‘varp’, ‘uns’]]]) – Whether to copy the data. Can be specified on per-attribute basis. Useful for attributes that store arrays. Attributes not specified here will not be copied.
- Return type
- Returns
adata :
anndata.AnnData
Annotated data object.
- property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]
Transition matrix of
kernel
.
- write(fname, write_adata=True, ext='pickle')
Serialize self to a file.
- Parameters
- Return type
- Returns
Nothing, just writes itself to a file using
pickle
.
Kernels
Velocity Kernel
- class cellrank.tl.kernels.VelocityKernel(adata, backward=False, vkey='velocity', xkey='Ms', gene_subset=None, compute_cond_num=False, check_connectivity=False, **kwargs)[source]
Kernel which computes a transition matrix based on RNA velocity.
This borrows ideas from both [La Manno et al., 2018] and [Bergen et al., 2020]. In short, for each cell i, we compute transition probabilities \(p_{i, j}\) to each cell j in the neighborhood of i. The transition probabilities are computed as a multinomial logistic regression where the weights \(w_j\) (for all j) are given by the vector that connects cell i with cell j in gene expression space, and the features \(x_i\) are given by the velocity vector \(v_i\) of cell i.
- Parameters
adata (
anndata.AnnData
) – Annotated data object.backward (
bool
) – Direction of the process.vkey (
str
) – Key inanndata.AnnData.layers
where velocities are stored.xkey (
str
) – Key inanndata.AnnData.layers
where expected gene expression counts are stored.gene_subset (
Optional
[Iterable
]) – List of genes to be used to compute transition probabilities. By default, genes fromanndata.AnnData.var
['velocity_genes']
are used.compute_cond_num (
bool
) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.check_connectivity (
bool
) – Check whether the underlying KNN graph is connected.kwargs (
Any
) – Keyword arguments forcellrank.tl.kernels.Kernel
.
- compute_transition_matrix(mode=VelocityMode.DETERMINISTIC, backward_mode=BackwardMode.TRANSPOSE, scheme=Scheme.CORRELATION, softmax_scale=None, n_samples=1000, seed=None, check_irreducibility=False, **kwargs)[source]
Compute transition matrix based on velocity directions on the local manifold.
For each cell, infer transition probabilities based on the cell’s velocity-extrapolated cell state and the cell states of its K nearest neighbors.
- Parameters
mode (
Literal
[‘deterministic’, ‘stochastic’, ‘sampling’, ‘monte_carlo’]) –How to compute transition probabilities. Valid options are:
’deterministic’ - deterministic computation that doesn’t propagate uncertainty.
’monte_carlo’ - Monte Carlo average of randomly sampled velocity vectors.
’stochastic’ - second order approximation, only available when
jax
is installed.’sampling’ - sample 1 transition matrix from the velocity distribution.
backward_mode (
Literal
[‘transpose’, ‘negate’]) –Only matters if initialized as
backward
= True
. Valid options are:’transpose’ - compute transitions from neighboring cells \(j\) to cell \(i\).
’negate’ - negate the velocity vector.
softmax_scale (
Optional
[float
]) – Scaling parameter for the softmax. If None, it will be estimated using1 / median(correlations)
. The idea behind this is to scale the softmax to counter everything tending to orthogonality in high dimensions.scheme (
Union
[Literal
[‘dot_product’, ‘cosine’, ‘correlation’],Callable
]) –Similarity scheme between cells as described in [Li et al., 2021]. Can be one of the following:
’dot_product’ -
cellrank.tl.kernels.DotProductScheme
.’cosine’ -
cellrank.tl.kernels.CosineScheme
.’correlation’ -
cellrank.tl.kernels.CorrelationScheme
.
Alternatively, any function can be passed as long as it follows the signature of
cellrank.tl.kernels.SimilaritySchemeABC.__call__()
.n_samples (
int
) – Number of bootstrap samples whenmode = 'monte_carlo'
.seed (
Optional
[int
]) – Set the seed for random state when the method requiresn_samples
.check_irreducibility (
bool
) – Optional check for irreducibility of the final transition matrix.show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.
n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.
backend – Which backend to use for parallelization. See
joblib.Parallel
for valid options.
- Return type
- Returns
Self and updates the following fields:
- property logits: scipy.sparse.csr.csr_matrix
Array of shape
(n_cells, n_cells)
containing the logits.- Return type
- copy()[source]
Return a copy of self.
- Return type
Cosine Similarity Scheme
- class cellrank.tl.kernels.CosineScheme[source]
Cosine similarity scheme as defined in eq. (4.7) [Li et al., 2021].
\(v(s_i, s_j) = g(cos(\delta_{i, j}, v_i))\)
where \(v_i\) is the velocity vector of cell \(i\), \(\delta_{i, j}\) corresponds to the transcriptional displacement between cells \(i\) and \(j\) and \(g\) is a softmax function with some scaling parameter.
- __call__(v, D, softmax_scale=1.0)
Compute transition probability of a cell to its nearest neighbors using RNA velocity.
- Parameters
v (
ndarray
) – Array of shape(n_genes,)
or(n_neighbors, n_genes)
containing the velocity vector(s). The second case is used for the backward process.D (
ndarray
) – Array of shape(n_neighbors, n_genes)
corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.softmax_scale (
float
) – Scaling factor for the softmax function.
- Return type
- Returns
The probability and logits arrays of shape
(n_neighbors,)
.
- hessian(v, D, softmax_scale=1.0)
Compute the Hessian.
- Parameters
- Return type
- Returns
The full Hessian of shape
(n_neighbors, n_genes, n_genes)
or only its diagonal of shape(n_neighbors, n_genes)
.
Correlation Scheme
- class cellrank.tl.kernels.CorrelationScheme[source]
Pearson correlation scheme as defined in eq. (4.8) [Li et al., 2021].
\(v(s_i, s_j) = g(corr(\delta_{i, j}, v_i))\)
where \(v_i\) is the velocity vector of cell \(i\), \(\delta_{i, j}\) corresponds to the transcriptional displacement between cells \(i\) and \(j\) and \(g\) is a softmax function with some scaling parameter.
- __call__(v, D, softmax_scale=1.0)
Compute transition probability of a cell to its nearest neighbors using RNA velocity.
- Parameters
v (
ndarray
) – Array of shape(n_genes,)
or(n_neighbors, n_genes)
containing the velocity vector(s). The second case is used for the backward process.D (
ndarray
) – Array of shape(n_neighbors, n_genes)
corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.softmax_scale (
float
) – Scaling factor for the softmax function.
- Return type
- Returns
The probability and logits arrays of shape
(n_neighbors,)
.
- hessian(v, D, softmax_scale=1.0)
Compute the Hessian.
- Parameters
- Return type
- Returns
The full Hessian of shape
(n_neighbors, n_genes, n_genes)
or only its diagonal of shape(n_neighbors, n_genes)
.
Dot Product Scheme
- class cellrank.tl.kernels.DotProductScheme[source]
Dot product scheme as defined in eq. (4.9) [Li et al., 2021].
\(v(s_i, s_j) = g(\delta_{i, j}^T v_i)\)
where \(v_i\) is the velocity vector of cell \(i\), \(\delta_{i, j}\) corresponds to the transcriptional displacement between cells \(i\) and \(j\) and \(g\) is a softmax function with some scaling parameter.
- __call__(v, D, softmax_scale=1.0)
Compute transition probability of a cell to its nearest neighbors using RNA velocity.
- Parameters
v (
ndarray
) – Array of shape(n_genes,)
or(n_neighbors, n_genes)
containing the velocity vector(s). The second case is used for the backward process.D (
ndarray
) – Array of shape(n_neighbors, n_genes)
corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.softmax_scale (
float
) – Scaling factor for the softmax function.
- Return type
- Returns
The probability and logits arrays of shape
(n_neighbors,)
.
- hessian(v, D, softmax_scale=1.0)
Compute the Hessian.
- Parameters
- Return type
- Returns
The full Hessian of shape
(n_neighbors, n_genes, n_genes)
or only its diagonal of shape(n_neighbors, n_genes)
.
Connectivity Kernel
- class cellrank.tl.kernels.ConnectivityKernel(adata, backward=False, conn_key='connectivities', compute_cond_num=False, check_connectivity=False)[source]
Kernel which computes transition probabilities based on similarities among cells.
As a measure of similarity, we currently support:
transcriptomic similarities, computed using e.g.
scanpy.pp.neighbors()
, see [Wolf et al., 2018].spatial similarities, computed using e.g.
squidpy.gr.spatial_neighbors()
, see [Palla et al., 2021].
The resulting transition matrix is symmetric and thus cannot be used to learn about the direction of the biological process. To include this direction, consider combining with a velocity-derived transition matrix via
cellrank.tl.kernels.VelocityKernel
.Optionally, we apply a density correction as described in [Coifman et al., 2005], where we use the implementation of [Haghverdi et al., 2016].
- Parameters
adata (
anndata.AnnData
) – Annotated data object.backward (
bool
) – Direction of the process.conn_key (
str
) – Key inanndata.AnnData.obsp
to obtain the connectivity matrix describing cell-cell similarity.compute_cond_num (
bool
) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.check_connectivity (
bool
) – Check whether the underlying KNN graph is connected.
- compute_transition_matrix(density_normalize=True)[source]
Compute transition matrix based on transcriptomic similarity.
Uses symmetric, weighted KNN graph to compute symmetric transition matrix. The connectivities are computed using
scanpy.pp.neighbors()
. Depending on the parameters used there, they can be UMAP connectivities or gaussian-kernel-based connectivities with adaptive kernel width.- Parameters
density_normalize (
bool
) – Whether or not to use the underlying KNN graph for density normalization.- Return type
- Returns
Self and updated
transition_matrix
.
- copy()[source]
Return a copy of self.
- Return type
Pseudotime Kernel
- class cellrank.tl.kernels.PseudotimeKernel(adata, backward=False, time_key='dpt_pseudotime', compute_cond_num=False, check_connectivity=False, **kwargs)[source]
Kernel which computes directed transition probabilities based on a KNN graph and pseudotime.
The KNN graph contains information about the (undirected) connectivities among cells, reflecting their similarity. Pseudotime can be used to either remove edges that point against the direction of increasing pseudotime [Setty et al., 2019], or to downweight them [Stassen et al., 2021].
- Parameters
adata (
anndata.AnnData
) – Annotated data object.backward (
bool
) – Direction of the process.time_key (
str
) – Key inadata
.obs
where the pseudotime is stored.compute_cond_num (
bool
) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.kwargs (
Any
) – Keyword arguments forcellrank.tl.kernels.Kernel
.
- compute_transition_matrix(threshold_scheme='hard', frac_to_keep=0.3, b=10.0, nu=0.5, check_irreducibility=False, n_jobs=None, backend='loky', show_progress_bar=True, **kwargs)[source]
Compute transition matrix based on KNN graph and pseudotemporal ordering.
Depending on the choice of the thresholding_scheme, this is based on ideas by either Palantir [Setty et al., 2019] or VIA [Stassen et al., 2021].
- Parameters
threshold_scheme (
Union
[Literal
[‘soft’, ‘hard’],Callable
]) –Which method to use when biasing the graph. Valid options are:
’hard’ - based on Palantir [Setty et al., 2019] which removes some edges that point against the direction of increasing pseudotime. To avoid disconnecting the graph, it does not remove all edges that point against the direction of increasing pseudotime, but keeps the ones that point to cells inside a close radius. This radius is chosen according to the local cell density.
’soft’ - based on VIA [Stassen et al., 2021] which downweights edges that points against the direction of increasing pseudotime. Essentially, the further “behind” a query cell is in pseudotime with respect to the current reference cell, the more penalized will be its graph-connectivity.
callable
- any function conforming to the signature ofcellrank.tl.kernels.ThresholdSchemeABC.__call__()
.
frac_to_keep (
float
) – The frac_to_keep * number of the closest neighbors (according to graph connectivities) are kept, no matter whether they lie in the pseudotemporal past or future. This is done to ensure that the graph remains connected. Only used whenthreshold_scheme = 'hard'
. Needs to fall within the interval [0, 1].b (
float
) – The growth rate of generalized logistic function. Only used whenthreshold_scheme = 'soft'
.nu (
float
) – Affects near which asymptote maximum growth occurs. Only used whenthreshold_scheme = 'soft'
.check_irreducibility (
bool
) – Optional check for irreducibility of the final transition matrix.show_progress_bar (
bool
) – Whether to show a progress bar. Disabling it may slightly improve performance.n_jobs (
Optional
[int
]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.backend (
Literal
[‘loky’, ‘multiprocessing’, ‘threading’]) – Which backend to use for parallelization. Seejoblib.Parallel
for valid options.kwargs (
Any
) – Keyword arguments forthreshold_scheme
.
- Return type
- Returns
Self and updated
transition_matrix
.
- property pseudotime: numpy.array
Pseudotemporal ordering of cells.
- Return type
array
- copy()[source]
Return a copy of self.
- Return type
Hard Threshold Scheme
- class cellrank.tl.kernels.HardThresholdScheme[source]
Thresholding scheme inspired by Palantir [Setty et al., 2019].
Note that this won’t exactly reproduce the original Palantir results, for three reasons:
Palantir computes the KNN graph in a scaled space of diffusion components.
Palantir uses its own pseudotime to bias the KNN graph which is not implemented here.
Palantir uses a slightly different mechanism to ensure the graph remains connected when removing edges that point into the “pseudotime past”.
- __call__(cell_pseudotime, neigh_pseudotime, neigh_conn, frac_to_keep=0.3)[source]
Convert the undirected graph of cell-cell similarities into a directed one by removing “past” edges.
This uses a pseudotemporal measure to remove graph-edges that point into the pseudotime-past. For each cell, it keeps the closest neighbors, even if they are in the pseudotime past, to make sure the graph remains connected.
- Parameters
cell_pseudotime (
float
) – Pseudotime of the current cell.neigh_pseudotime (
ndarray
) – Array of shape(n_neighbors,)
containing pseudotime of neighbors.neigh_conn (
ndarray
) – Array of shape(n_neighbors,)
containing connectivities of the current cell and its neighbors.frac_to_keep (
float
) – The frac_to_keep * n_neighbors closest neighbors (according to graph connectivities) are kept, no matter whether they lie in the pseudotemporal past or future. frac_to_keep needs to fall within the interval [0, 1].
- Return type
- Returns
Array of shape
(n_neighbors,)
containing the biased connectivities.
Soft Threshold Scheme
- class cellrank.tl.kernels.SoftThresholdScheme[source]
Thresholding scheme inspired by [Stassen et al., 2021].
The idea is to downweight edges that points against the direction of increasing pseudotime. Essentially, the further “behind” a query cell is in pseudotime with respect to the current reference cell, the more penalized will be its graph-connectivity.
- __call__(cell_pseudotime, neigh_pseudotime, neigh_conn, b=10.0, nu=0.5)[source]
Bias the connectivities by downweighting ones to past cells.
This function uses generalized logistic regression to weight the past connectivities.
- Parameters
cell_pseudotime (
float
) – Pseudotime of the current cell.neigh_pseudotime (
ndarray
) – Array of shape(n_neighbors,)
containing pseudotime of neighbors.neigh_conn (
ndarray
) – Array of shape(n_neighbors,)
containing connectivities of the current cell and its neighbors.b (
float
) – The growth rate of generalized logistic function.nu (
float
) – Affects near which asymptote maximum growth occurs.
- Return type
- Returns
Array of shape
(n_neighbors,)
containing the biased connectivities.
CytoTRACE Kernel
- class cellrank.tl.kernels.CytoTRACEKernel(adata, backward=False, layer='Ms', aggregation=CytoTRACEAggregation.MEAN, use_raw=False, n_top_genes=200, compute_cond_num=False, check_connectivity=False, **kwargs)[source]
Kernel which computes directed transition probabilities based on a KNN graph and the CytoTRACE score [Gulati et al., 2020].
The KNN graph contains information about the (undirected) connectivities among cells, reflecting their similarity. CytoTRACE can be used to estimate cellular plasticity and in turn, a pseudotemporal ordering of cells from more plastic to less plastic states. It relies on the assumption that differentiated cells express, on average, less genes than naive cells. This kernel internally uses the
cellrank.tl.kernels.PseudotimeKernel
to direct the KNN graph on the basis of the CytoTRACE-derived pseudotime.Optionally, we apply a density correction as described in [Coifman et al., 2005], where we use the implementation of [Haghverdi et al., 2016].
- Parameters
adata (
anndata.AnnData
) – Annotated data object.backward (
bool
) – Direction of the process.layer (
Optional
[str
]) – Key inanndata.AnnData.layers
or ‘X’ foranndata.AnnData.X
from where to get the expression.aggregation (
Literal
[‘mean’, ‘median’, ‘hmean’, ‘gmean’]) –How to aggregate expression of the top-correlating genes. Valid options are:
’mean’ - arithmetic mean.
’median’ - median.
’hmean’ - harmonic mean.
’gmean’ - geometric mean.
use_raw (
bool
) – Whether to use theanndata.AnnData.raw
to compute the number of genes expressed per cell (#genes/cell) and the correlation of gene expression across cells with #genes/cell.n_top_genes (
int
) – Number of genes used to compute the CytoTRACE score.compute_cond_num (
bool
) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.check_connectivity (
bool
) – Check whether the underlying KNN graph is connected.kwargs (
Any
) – Keyword arguments forcellrank.tl.kernels.PseudotimeKernel
.
Example
Workflow:
# import packages and load data import scvelo as scv import cellrank as cr adata = cr.datasets.pancreas() # standard pre-processing sc.pp.filter_genes(adata, min_cells=10) sc.pp.normalize_total(adata) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata) # CytoTRACE by default uses imputed data - a simple way to compute KNN-imputed data is to use scVelo's moments # function. However, note that this function expects `spliced` counts because it's designed for RNA velocity, # so we're using a simple hack here: if 'spliced' not in adata.layers or 'unspliced' not in adata.layers: adata.layers['spliced'] = adata.X adata.layers['unspliced'] = adata.X # compute KNN-imputation using scVelo's moments function scv.pp.moments(adata) # import and initialize the CytoTRACE kernel, compute transition matrix - done! from cellrank.tl.kernels import CytoTRACEKernel ctk = CytoTRACEKernel(adata).compute_transition_matrix()
- compute_cytotrace(layer='Ms', aggregation=CytoTRACEAggregation.MEAN, use_raw=False, n_top_genes=200)[source]
Re-implementation of the CytoTRACE algorithm [Gulati et al., 2020] to estimate cellular plasticity.
Computes the number of genes expressed per cell and ranks genes according to their correlation with this measure. Next, it selects to top-correlating genes and aggregates their (imputed) expression to obtain the CytoTRACE score. A high score stands for high differentiation potential (naive, plastic cells) and a low score stands for low differentiation potential (mature, differentiation cells).
- Parameters
layer (
Optional
[str
]) – Key inanndata.AnnData.layers
or ‘X’ foranndata.AnnData.X
from where to get the expression.aggregation (
Literal
[‘mean’, ‘median’, ‘hmean’, ‘gmean’]) –How to aggregate expression of the top-correlating genes. Valid options are:
’mean’ - arithmetic mean.
’median’ - median.
’hmean’ - harmonic mean.
’gmean’ - geometric mean.
use_raw (
bool
) – Whether to use theanndata.AnnData.raw
to compute the number of genes expressed per cell (#genes/cell) and the correlation of gene expression across cells with #genes/cell.n_top_genes (
int
) – Number of genes used to compute the CytoTRACE score.
- Return type
- Returns
Nothing, just modifies
anndata.AnnData.obs
with the following keys:’ct_score’ - the normalized CytoTRACE score.
’ct_pseudotime’ - associated pseudotime, essentially 1 - CytoTRACE score.
’ct_num_exp_genes’ - the number of genes expressed per cell, basis of the CytoTRACE score.
It also modifies
anndata.AnnData.var
with the following keys:’ct_gene_corr’ - the correlation as specified above.
’ct_correlates’ - indication of the genes used to compute the CytoTRACE score, i.e. the ones that correlated best with ‘num_exp_genes’.
Notes
This will not exactly reproduce the results of the original CytoTRACE algorithm [Gulati et al., 2020] because we allow for any normalization and imputation techniques whereas CytoTRACE has built-in specific methods for that.
- compute_transition_matrix(threshold_scheme='hard', frac_to_keep=0.3, b=10.0, nu=0.5, check_irreducibility=False, n_jobs=None, backend='loky', show_progress_bar=True, **kwargs)
Compute transition matrix based on KNN graph and pseudotemporal ordering.
Depending on the choice of the thresholding_scheme, this is based on ideas by either Palantir [Setty et al., 2019] or VIA [Stassen et al., 2021].
- Parameters
threshold_scheme (
Union
[Literal
[‘soft’, ‘hard’],Callable
]) –Which method to use when biasing the graph. Valid options are:
’hard’ - based on Palantir [Setty et al., 2019] which removes some edges that point against the direction of increasing pseudotime. To avoid disconnecting the graph, it does not remove all edges that point against the direction of increasing pseudotime, but keeps the ones that point to cells inside a close radius. This radius is chosen according to the local cell density.
’soft’ - based on VIA [Stassen et al., 2021] which downweights edges that points against the direction of increasing pseudotime. Essentially, the further “behind” a query cell is in pseudotime with respect to the current reference cell, the more penalized will be its graph-connectivity.
callable
- any function conforming to the signature ofcellrank.tl.kernels.ThresholdSchemeABC.__call__()
.
frac_to_keep (
float
) – The frac_to_keep * number of the closest neighbors (according to graph connectivities) are kept, no matter whether they lie in the pseudotemporal past or future. This is done to ensure that the graph remains connected. Only used whenthreshold_scheme = 'hard'
. Needs to fall within the interval [0, 1].b (
float
) – The growth rate of generalized logistic function. Only used whenthreshold_scheme = 'soft'
.nu (
float
) – Affects near which asymptote maximum growth occurs. Only used whenthreshold_scheme = 'soft'
.check_irreducibility (
bool
) – Optional check for irreducibility of the final transition matrix.show_progress_bar (
bool
) – Whether to show a progress bar. Disabling it may slightly improve performance.n_jobs (
Optional
[int
]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.backend (
Literal
[‘loky’, ‘multiprocessing’, ‘threading’]) – Which backend to use for parallelization. Seejoblib.Parallel
for valid options.kwargs (
Any
) – Keyword arguments forthreshold_scheme
.
- Return type
- Returns
Self and updated
transition_matrix
.
Precomputed Kernel
- class cellrank.tl.kernels.PrecomputedKernel(transition_matrix=None, adata=None, backward=False, compute_cond_num=False, **kwargs)[source]
Kernel which contains a precomputed transition matrix.
- Parameters
transition_matrix (
Union
[ndarray
,spmatrix
,KernelExpression
,str
,None
]) – Row-normalized transition matrix or a key inanndata.AnnData.obsp
. or acellrank.tl.kernels.KernelExpression
with a precomputed transition matrix. If None, try to determine the key based onbackward
.adata (
anndata.AnnData
) – Annotated data object. If None, a temporary placeholder object is created.backward (
bool
) – Direction of the process.compute_cond_num (
bool
) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.kwargs (
Any
) – Keyword arguments forcellrank.tl.kernels.Kernel
.
- copy()[source]
Return a copy of self.
- Return type
- compute_transition_matrix(*args, **kwargs)[source]
Return self.
- Return type
Models
GAM
- class cellrank.ul.models.GAM(adata, n_knots=6, spline_order=3, distribution='gamma', link='log', max_iter=2000, expectile=None, grid=None, spline_kwargs=mappingproxy({}), **kwargs)[source]
Fit Generalized Additive Models (GAMs) using
pygam
.- Parameters
adata (
anndata.AnnData
) – Annotated data object.spline_order (
int
) – Order of the splines, i.e. 3 for cubic splines.distribution (
Literal
[‘normal’, ‘binomial’, ‘poisson’, ‘gamma’, ‘gaussian’, ‘inv_gauss’]) – Name of the distribution. Available distributions can be found here.link (
Literal
[‘identity’, ‘logit’, ‘inverse’, ‘log’, ‘inverse-squared’]) – Name of the link function. Available link functions can be found here.max_iter (
int
) – Maximum number of iterations for optimization.expectile (
Optional
[float
]) – Expectile forpygam.pygam.ExpectileGAM
. This forces the distribution to be ‘normal’ and link function to ‘identity’. Must be in interval (0, 1).grid (
Union
[str
,Mapping
[str
,Any
],None
]) – Whether to perform a grid search. Keys correspond to a parameter names and values to range to be searched. If ‘default’, use the default grid. If None, don’t perform a grid search.spline_kwargs (
Mapping
[str
,Any
]) – Keyword arguments forpygam.s
.kwargs (
Any
) – Keyword arguments forpygam.pygam.GAM
.
- fit(x=None, y=None, w=None, **kwargs)[source]
Fit the model.
- Parameters
x (
Optional
[ndarray
]) – Independent variables, array of shape (n_samples, 1). If None, usex
.y (
Optional
[ndarray
]) – Dependent variables, array of shape (n_samples, 1). If None, usey
.w (
Optional
[ndarray
]) – Optional weights ofx
, array of shape (n_samples,). If None, usew
.kwargs – Keyword arguments for underlying
model
’s fitting function.
- Return type
- Returns
Fits the model and returns self.
- predict(x_test=None, key_added='_x_test', **kwargs)[source]
Run the prediction.
- Parameters
- Return type
- Returns
Updates and returns the following field:
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- Returns
adata :
anndata.AnnData
Annotated data object.
- property conf_int: numpy.ndarray
Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
- Return type
- confidence_interval(x_test=None, **kwargs)[source]
Calculate the confidence interval.
- Parameters
x_test (
Optional
[ndarray
]) – Array of shape (n_samples,) used for confidence interval calculation. If None, usex_test
.kwargs – Keyword arguments for underlying
model
’s confidence method or fordefault_confidence_interval()
.
- Return type
- Returns
Updates and returns the following field:
conf_int
- Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
- default_confidence_interval(x_test=None, **kwargs)
Calculate the confidence interval, if the underlying
model
has no method for it.This formula is taken from [DeSalvo, 1970], eq. 5.
- Parameters
x_test (
Optional
[ndarray
]) – Array of shape (n_samples,) used for confidence interval calculation. If None, usex_test
.kwargs – Keyword arguments for underlying
model
’s confidence method or fordefault_confidence_interval()
.
- Return type
- Returns
Updates and returns the following field:
conf_int
- Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
Also update the following fields:
- property model: Any
Underlying model.
- Return type
- plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)
Plot the smoothed gene expression.
- Parameters
same_plot (
bool
) – Whether to plot all trends in the same plot.hide_cells (
bool
) – Whether to hide the cells.perc (
Optional
[Tuple
[float
,float
]]) – Percentile by which to clip the absorption probabilities.abs_prob_cmap (
ListedColormap
) – Colormap to use when coloring in the absorption probabilities.cell_color (
Optional
[str
]) – Key inanndata.AnnData.obs
oranndata.AnnData.var_names
used for coloring the cells.lineage_color (
str
) – Color for the lineage.alpha (
float
) – Alpha channel for cells.lineage_alpha (
float
) – Alpha channel for lineage confidence intervals.size (
int
) – Size of the points.lw (
float
) – Line width for the smoothed values.cbar (
bool
) – Whether to show colorbar.margins (
float
) – Margins around the plot.xlabel (
str
) – Label on the x-axis.ylabel (
str
) – Label on the y-axis.conf_int (
bool
) – Whether to show the confidence interval.lineage_probability (
bool
) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.lineage_probability_conf_int (
Union
[bool
,float
]) – Whether to compute and show smoothed lineage probability confidence interval. Ifself
iscellrank.ul.models.GAMR
, it can also specify the confidence level, the default is 0.95. Only used whenshow_lineage_probability=True
.lineage_probability_color (
Optional
[str
]) – Color to use when plotting the smoothedlineage_probability
. If None, it’s the same aslineage_color
. Only used whenshow_lineage_probability=True
.obs_legend_loc (
Optional
[str
]) – Location of the legend whencell_color
corresponds to a categorical variable.fig (
Optional
[Figure
]) – Figure to use, if None, create a new one.ax (
matplotlib.axes.Axes
) – Ax to use, if None, create a new one.return_fig (
bool
) – If True, return the figure object.save (
Optional
[str
]) – Filename where to save the plot. If None, just shows the plots.kwargs – Keyword arguments for
matplotlib.axes.Axes.legend()
, e.g. to disable the legend, specifyloc=None
. Only available whenshow_lineage_probability=True
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)
Prepare the model to be ready for fitting.
- Parameters
gene (
str
) – Gene inanndata.AnnData.var_names
.lineage (
Optional
[str
]) – Name of a lineage inanndata.AnnData.obsm
['{lineage_key}']
. If None, all weights will be set to 1.backward (
bool
) – Direction of the process.time_range (
Union
[float
,Tuple
[float
,float
],None
]) –Specify start and end times:
data_key (
Optional
[str
]) – Key inanndata.AnnData.layers
or ‘X’ foranndata.AnnData.X
. Ifuse_raw = True
, it’s always set to ‘X’.time_key (
str
) – Key inanndata.AnnData.obs
where the pseudotime is stored.use_raw (
bool
) – Whether to accessanndata.AnnData.raw
.threshold (
Optional
[float
]) – Consider only cells with weights >threshold
when estimating the test endpoint. If None, use the median of the weights.weight_threshold (
Union
[float
,Tuple
[float
,float
]]) – Set all weights belowweight_threshold
toweight_threshold
if afloat
, or to the second value, if atuple
.filter_cells (
Optional
[float
]) – Filter out all cells with expression values lower than this threshold.n_test_points (
int
) – Number of test points. If None, use the original points based onthreshold
.
- Return type
- Returns
Nothing, just updates the following fields:
x
- Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.y
- Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.w
- Filtered weights of shape (n_filtered_cells,) used for fitting.x_all
- Unfiltered independent variables of shape (n_cells, 1).y_all
- Unfiltered dependent variables of shape (n_cells, 1).w_all
- Unfiltered weights of shape (n_cells,).x_test
- Independent variables of shape (n_samples, 1) used for prediction.prepared
- Whether the model is prepared for fitting.
- property prepared
Whether the model is prepared for fitting.
- static read(fname, adata=None, copy=False)
Deserialize self from a file.
- Parameters
fname (
Union
[str
,Path
]) – Filename from which to read the object.adata (
Optional
[AnnData
]) –anndata.AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it or not. Ifadata
is a view, it is always copied.
- Return type
IOMixin
- Returns
The deserialized object.
- property w: numpy.ndarray
Filtered weights of shape (n_filtered_cells,) used for fitting.
- Return type
- property w_all: numpy.ndarray
Unfiltered weights of shape (n_cells,).
- Return type
- write(fname, write_adata=True, ext='pickle')
Serialize self to a file.
- Parameters
- Return type
- Returns
Nothing, just writes itself to a file using
pickle
.
- property x: numpy.ndarray
Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.
- Return type
- property x_all: numpy.ndarray
Unfiltered independent variables of shape (n_cells, 1).
- Return type
- property x_hat: numpy.ndarray
Filtered independent variables used when calculating default confidence interval, usually same as
x
.- Return type
- property x_test: numpy.ndarray
Independent variables of shape (n_samples, 1) used for prediction.
- Return type
- property y: numpy.ndarray
Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.
- Return type
- property y_all: numpy.ndarray
Unfiltered dependent variables of shape (n_cells, 1).
- Return type
- property y_hat: numpy.ndarray
Filtered dependent variables used when calculating default confidence interval, usually same as
y
.- Return type
- property y_test: numpy.ndarray
Prediction values of shape (n_samples,) for
x_test
.- Return type
SKLearnModel
- class cellrank.ul.models.SKLearnModel(adata, model, weight_name=None, ignore_raise=False)[source]
Wrapper around
sklearn.base.BaseEstimator
.- Parameters
adata (
anndata.AnnData
) – Annotated data object.model (
BaseEstimator
) – Instance of the underlyingsklearn
estimator, such assklearn.svm.SVR
.weight_name (
Optional
[str
]) – Name of the weight argument formodel
.fit
. If None, to determine it automatically. If and empty string, no weights will be used.ignore_raise (
bool
) – Do not raise an exception if weight argument is not found in the fitting function ofmodel
. This is useful in case when weight is passed in**kwargs
and cannot be determined from signature.
- fit(x=None, y=None, w=None, **kwargs)[source]
Fit the model.
- Parameters
x (
Optional
[ndarray
]) – Independent variables, array of shape (n_samples, 1). If None, usex
.y (
Optional
[ndarray
]) – Dependent variables, array of shape (n_samples, 1). If None, usey
.w (
Optional
[ndarray
]) – Optional weights ofx
, array of shape (n_samples,). If None, usew
.kwargs – Keyword arguments for underlying
model
’s fitting function.
- Return type
- Returns
Fits the model and returns self.
- predict(x_test=None, key_added='_x_test', **kwargs)[source]
Run the prediction.
- Parameters
- Return type
- Returns
Updates and returns the following field:
- confidence_interval(x_test=None, **kwargs)[source]
Calculate the confidence interval.
Use
default_confidence_interval()
function if underlyingmodel
has not method for confidence interval calculation.- Parameters
x_test (
Optional
[ndarray
]) – Array of shape (n_samples,) used for confidence interval calculation. If None, usex_test
.kwargs – Keyword arguments for underlying
model
’s confidence method or fordefault_confidence_interval()
.
- Return type
- Returns
Updates and returns the following field:
conf_int
- Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
- property model: sklearn.base.BaseEstimator
The underlying
sklearn.base.BaseEstimator
.- Return type
- copy()[source]
Return a copy of self.
- Return type
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- Returns
adata :
anndata.AnnData
Annotated data object.
- property conf_int: numpy.ndarray
Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
- Return type
- default_confidence_interval(x_test=None, **kwargs)
Calculate the confidence interval, if the underlying
model
has no method for it.This formula is taken from [DeSalvo, 1970], eq. 5.
- Parameters
x_test (
Optional
[ndarray
]) – Array of shape (n_samples,) used for confidence interval calculation. If None, usex_test
.kwargs – Keyword arguments for underlying
model
’s confidence method or fordefault_confidence_interval()
.
- Return type
- Returns
Updates and returns the following field:
conf_int
- Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
Also update the following fields:
- plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)
Plot the smoothed gene expression.
- Parameters
same_plot (
bool
) – Whether to plot all trends in the same plot.hide_cells (
bool
) – Whether to hide the cells.perc (
Optional
[Tuple
[float
,float
]]) – Percentile by which to clip the absorption probabilities.abs_prob_cmap (
ListedColormap
) – Colormap to use when coloring in the absorption probabilities.cell_color (
Optional
[str
]) – Key inanndata.AnnData.obs
oranndata.AnnData.var_names
used for coloring the cells.lineage_color (
str
) – Color for the lineage.alpha (
float
) – Alpha channel for cells.lineage_alpha (
float
) – Alpha channel for lineage confidence intervals.size (
int
) – Size of the points.lw (
float
) – Line width for the smoothed values.cbar (
bool
) – Whether to show colorbar.margins (
float
) – Margins around the plot.xlabel (
str
) – Label on the x-axis.ylabel (
str
) – Label on the y-axis.conf_int (
bool
) – Whether to show the confidence interval.lineage_probability (
bool
) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.lineage_probability_conf_int (
Union
[bool
,float
]) – Whether to compute and show smoothed lineage probability confidence interval. Ifself
iscellrank.ul.models.GAMR
, it can also specify the confidence level, the default is 0.95. Only used whenshow_lineage_probability=True
.lineage_probability_color (
Optional
[str
]) – Color to use when plotting the smoothedlineage_probability
. If None, it’s the same aslineage_color
. Only used whenshow_lineage_probability=True
.obs_legend_loc (
Optional
[str
]) – Location of the legend whencell_color
corresponds to a categorical variable.fig (
Optional
[Figure
]) – Figure to use, if None, create a new one.ax (
matplotlib.axes.Axes
) – Ax to use, if None, create a new one.return_fig (
bool
) – If True, return the figure object.save (
Optional
[str
]) – Filename where to save the plot. If None, just shows the plots.kwargs – Keyword arguments for
matplotlib.axes.Axes.legend()
, e.g. to disable the legend, specifyloc=None
. Only available whenshow_lineage_probability=True
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)
Prepare the model to be ready for fitting.
- Parameters
gene (
str
) – Gene inanndata.AnnData.var_names
.lineage (
Optional
[str
]) – Name of a lineage inanndata.AnnData.obsm
['{lineage_key}']
. If None, all weights will be set to 1.backward (
bool
) – Direction of the process.time_range (
Union
[float
,Tuple
[float
,float
],None
]) –Specify start and end times:
data_key (
Optional
[str
]) – Key inanndata.AnnData.layers
or ‘X’ foranndata.AnnData.X
. Ifuse_raw = True
, it’s always set to ‘X’.time_key (
str
) – Key inanndata.AnnData.obs
where the pseudotime is stored.use_raw (
bool
) – Whether to accessanndata.AnnData.raw
.threshold (
Optional
[float
]) – Consider only cells with weights >threshold
when estimating the test endpoint. If None, use the median of the weights.weight_threshold (
Union
[float
,Tuple
[float
,float
]]) – Set all weights belowweight_threshold
toweight_threshold
if afloat
, or to the second value, if atuple
.filter_cells (
Optional
[float
]) – Filter out all cells with expression values lower than this threshold.n_test_points (
int
) – Number of test points. If None, use the original points based onthreshold
.
- Return type
- Returns
Nothing, just updates the following fields:
x
- Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.y
- Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.w
- Filtered weights of shape (n_filtered_cells,) used for fitting.x_all
- Unfiltered independent variables of shape (n_cells, 1).y_all
- Unfiltered dependent variables of shape (n_cells, 1).w_all
- Unfiltered weights of shape (n_cells,).x_test
- Independent variables of shape (n_samples, 1) used for prediction.prepared
- Whether the model is prepared for fitting.
- property prepared
Whether the model is prepared for fitting.
- static read(fname, adata=None, copy=False)
Deserialize self from a file.
- Parameters
fname (
Union
[str
,Path
]) – Filename from which to read the object.adata (
Optional
[AnnData
]) –anndata.AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it or not. Ifadata
is a view, it is always copied.
- Return type
IOMixin
- Returns
The deserialized object.
- property w: numpy.ndarray
Filtered weights of shape (n_filtered_cells,) used for fitting.
- Return type
- property w_all: numpy.ndarray
Unfiltered weights of shape (n_cells,).
- Return type
- write(fname, write_adata=True, ext='pickle')
Serialize self to a file.
- Parameters
- Return type
- Returns
Nothing, just writes itself to a file using
pickle
.
- property x: numpy.ndarray
Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.
- Return type
- property x_all: numpy.ndarray
Unfiltered independent variables of shape (n_cells, 1).
- Return type
- property x_hat: numpy.ndarray
Filtered independent variables used when calculating default confidence interval, usually same as
x
.- Return type
- property x_test: numpy.ndarray
Independent variables of shape (n_samples, 1) used for prediction.
- Return type
- property y: numpy.ndarray
Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.
- Return type
- property y_all: numpy.ndarray
Unfiltered dependent variables of shape (n_cells, 1).
- Return type
- property y_hat: numpy.ndarray
Filtered dependent variables used when calculating default confidence interval, usually same as
y
.- Return type
- property y_test: numpy.ndarray
Prediction values of shape (n_samples,) for
x_test
.- Return type
GAMR
- class cellrank.ul.models.GAMR(adata, n_knots=5, distribution='gaussian', basis='cr', knotlocs=KnotLocs.AUTO, offset='default', smoothing_penalty=1.0, **kwargs)[source]
Wrapper around R’s mgcv package for fitting Generalized Additive Models (GAMs).
- Parameters
adata (
anndata.AnnData
) – Annotated data object.n_knots (
int
) – Number of knots.distribution (
str
) – Distribution family in rpy2.robjects.r, such as ‘gaussian’ or ‘nb’ for negative binomial. If ‘nb’, raw count data inadata
.raw
is always used.basis (
str
) – Basis for the smoothing term. See here for valid options.knotlocs (
Literal
[‘auto’, ‘density’]) –Position of the knots. Can be one of the following:
’auto’ - let mgcv handle the knot positions.
’density’ - position the knots based on the density of the pseudotime.
offset (
Union
[ndarray
,Literal
[‘default’],None
]) – Offset term for the GAM. Only available whendistribution='nb'
. If ‘default’, it is calculated according to [Robinson and Oshlack, 2010]. The values are saved inadata
.obs['cellrank_offset']
. If None, no offset is used.smoothing_penalty (
float
) – Penalty for the smoothing term. The larger the value, the smoother the fitted curve.kwargs – Keyword arguments for
gam.control
. See here for reference.
- prepare(*args, **kwargs)[source]
Prepare the model to be ready for fitting. This also removes the zero and negative weights and prepares the design matrix.
- Parameters
gene – Gene in
anndata.AnnData.var_names
.lineage – Name of a lineage in
anndata.AnnData.obsm
['{lineage_key}']
. If None, all weights will be set to 1.backward – Direction of the process.
time_range –
Specify start and end times:
data_key – Key in
anndata.AnnData.layers
or ‘X’ foranndata.AnnData.X
. Ifuse_raw = True
, it’s always set to ‘X’.time_key – Key in
anndata.AnnData.obs
where the pseudotime is stored.use_raw – Whether to access
anndata.AnnData.raw
.threshold – Consider only cells with weights >
threshold
when estimating the test endpoint. If None, use the median of the weights.weight_threshold – Set all weights below
weight_threshold
toweight_threshold
if afloat
, or to the second value, if atuple
.filter_cells – Filter out all cells with expression values lower than this threshold.
n_test_points – Number of test points. If None, use the original points based on
threshold
.
- Return type
- Returns
Nothing, just updates the following fields:
x
- Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.y
- Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.w
- Filtered weights of shape (n_filtered_cells,) used for fitting.x_all
- Unfiltered independent variables of shape (n_cells, 1).y_all
- Unfiltered dependent variables of shape (n_cells, 1).w_all
- Unfiltered weights of shape (n_cells,).x_test
- Independent variables of shape (n_samples, 1) used for prediction.prepared
- Whether the model is prepared for fitting.
- fit(x=None, y=None, w=None, **kwargs)[source]
Fit the model.
- Parameters
x (
Optional
[ndarray
]) – Independent variables, array of shape (n_samples, 1). If None, usex
.y (
Optional
[ndarray
]) – Dependent variables, array of shape (n_samples, 1). If None, usey
.w (
Optional
[ndarray
]) – Optional weights ofx
, array of shape (n_samples,). If None, usew
.kwargs – Keyword arguments for underlying
model
’s fitting function.
- Return type
- Returns
Fits the model and returns self. Updates the following fields by filtering out 0 weights
w
:
- predict(x_test=None, key_added='_x_test', level=None, **kwargs)[source]
Run the prediction. This method can also compute the confidence interval.
- Parameters
x_test (
Optional
[ndarray
]) – Array of shape (n_samples,) used for prediction. If None, usex_test
.key_added (
str
) – Attribute name where to save thex_test
for later use. If None, don’t save it.kwargs – Keyword arguments for underlying
model
’s prediction method.level (
Optional
[float
]) – Confidence level for confidence interval calculation. If None, don’t compute the confidence interval. Must be in the interval [0, 1].
- Return type
- Returns
Updates and returns the following field:
- confidence_interval(x_test=None, level=0.95, **kwargs)[source]
Calculate the confidence interval. Internally, this method calls
cellrank.ul.models.GAMR.predict()
to extract the confidence interval, if needed.- Parameters
- Return type
- Returns
Updates and returns the following field:
conf_int
- Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- Returns
adata :
anndata.AnnData
Annotated data object.
- property conf_int: numpy.ndarray
Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
- Return type
- default_confidence_interval(x_test=None, **kwargs)
Calculate the confidence interval, if the underlying
model
has no method for it.This formula is taken from [DeSalvo, 1970], eq. 5.
- Parameters
x_test (
Optional
[ndarray
]) – Array of shape (n_samples,) used for confidence interval calculation. If None, usex_test
.kwargs – Keyword arguments for underlying
model
’s confidence method or fordefault_confidence_interval()
.
- Return type
- Returns
Updates and returns the following field:
conf_int
- Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
Also update the following fields:
- property model: Any
Underlying model.
- Return type
- plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, abs_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)
Plot the smoothed gene expression.
- Parameters
same_plot (
bool
) – Whether to plot all trends in the same plot.hide_cells (
bool
) – Whether to hide the cells.perc (
Optional
[Tuple
[float
,float
]]) – Percentile by which to clip the absorption probabilities.abs_prob_cmap (
ListedColormap
) – Colormap to use when coloring in the absorption probabilities.cell_color (
Optional
[str
]) – Key inanndata.AnnData.obs
oranndata.AnnData.var_names
used for coloring the cells.lineage_color (
str
) – Color for the lineage.alpha (
float
) – Alpha channel for cells.lineage_alpha (
float
) – Alpha channel for lineage confidence intervals.size (
int
) – Size of the points.lw (
float
) – Line width for the smoothed values.cbar (
bool
) – Whether to show colorbar.margins (
float
) – Margins around the plot.xlabel (
str
) – Label on the x-axis.ylabel (
str
) – Label on the y-axis.conf_int (
bool
) – Whether to show the confidence interval.lineage_probability (
bool
) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.lineage_probability_conf_int (
Union
[bool
,float
]) – Whether to compute and show smoothed lineage probability confidence interval. Ifself
iscellrank.ul.models.GAMR
, it can also specify the confidence level, the default is 0.95. Only used whenshow_lineage_probability=True
.lineage_probability_color (
Optional
[str
]) – Color to use when plotting the smoothedlineage_probability
. If None, it’s the same aslineage_color
. Only used whenshow_lineage_probability=True
.obs_legend_loc (
Optional
[str
]) – Location of the legend whencell_color
corresponds to a categorical variable.fig (
Optional
[Figure
]) – Figure to use, if None, create a new one.ax (
matplotlib.axes.Axes
) – Ax to use, if None, create a new one.return_fig (
bool
) – If True, return the figure object.save (
Optional
[str
]) – Filename where to save the plot. If None, just shows the plots.kwargs – Keyword arguments for
matplotlib.axes.Axes.legend()
, e.g. to disable the legend, specifyloc=None
. Only available whenshow_lineage_probability=True
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
.
- property prepared
Whether the model is prepared for fitting.
- static read(fname, adata=None, copy=False)
Deserialize self from a file.
- Parameters
fname (
Union
[str
,Path
]) – Filename from which to read the object.adata (
Optional
[AnnData
]) –anndata.AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it or not. Ifadata
is a view, it is always copied.
- Return type
IOMixin
- Returns
The deserialized object.
- property w: numpy.ndarray
Filtered weights of shape (n_filtered_cells,) used for fitting.
- Return type
- property w_all: numpy.ndarray
Unfiltered weights of shape (n_cells,).
- Return type
- write(fname, write_adata=True, ext='pickle')
Serialize self to a file.
- Parameters
- Return type
- Returns
Nothing, just writes itself to a file using
pickle
.
- property x: numpy.ndarray
Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.
- Return type
- property x_all: numpy.ndarray
Unfiltered independent variables of shape (n_cells, 1).
- Return type
- property x_hat: numpy.ndarray
Filtered independent variables used when calculating default confidence interval, usually same as
x
.- Return type
- property x_test: numpy.ndarray
Independent variables of shape (n_samples, 1) used for prediction.
- Return type
- property y: numpy.ndarray
Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.
- Return type
- property y_all: numpy.ndarray
Unfiltered dependent variables of shape (n_cells, 1).
- Return type
- property y_hat: numpy.ndarray
Filtered dependent variables used when calculating default confidence interval, usually same as
y
.- Return type
- property y_test: numpy.ndarray
Prediction values of shape (n_samples,) for
x_test
.- Return type
Base Classes
BaseEstimator
- class cellrank.tl.estimators.BaseEstimator(obj, obsp_key=None)[source]
Base class for all estimators.
- Parameters
obj (
Union
[AnnData
,ndarray
,spmatrix
,KernelExpression
]) –Can be one of the following:
cellrank.tl.kernels.Kernel
- kernel object.anndata.AnnData
- annotated data object containing transition matrix inanndata.AnnData.obsp
.numpy.ndarray
- row-normalized sparse transition matrix.scipy.sparse.spmatrix
- row-normalized sparse transition matrix.
obsp_key (
Optional
[str
]) – Key inanndata.AnnData.obsp
where the transition matrix is stored. Only used whenobj
is ananndata.AnnData
object.
- to_adata(keep=('X', 'raw'), *, copy=True)[source]
Serialize self to
anndata.Anndata
.- Parameters
keep (
Union
[Literal
[‘all’],Sequence
[Literal
[‘X’, ‘raw’, ‘layers’, ‘obs’, ‘var’, ‘obsm’, ‘varm’, ‘obsp’, ‘varp’, ‘uns’]]]) –Which attributes to keep from the underlying
adata
. Valid options are:’all’ - keep all attributes specified in the signature.
typing.Sequence
- keep only subset of these attributes.dict
- the keys correspond the attribute names and values to a subset of keys which to keep from this attribute. If the values are specified either as True or ‘all’, everything from this attribute will be kept.
copy (
Union
[bool
,Sequence
[Literal
[‘X’, ‘raw’, ‘layers’, ‘obs’, ‘var’, ‘obsm’, ‘varm’, ‘obsp’, ‘varp’, ‘uns’]]]) – Whether to copy the data. Can be specified on per-attribute basis. Useful for attributes that store arrays. Attributes not specified here will not be copied.
- Return type
- Returns
adata :
anndata.AnnData
Annotated data object.
- classmethod from_adata(adata, obsp_key)[source]
Deserialize self from
anndata.AnnData
.- Parameters
adata (
anndata.AnnData
) – Annotated data object.obsp_key (
str
) – Key inanndata.AnnData.obsp
where the transition matrix is stored.
- Return type
- Returns
The deserialized object.
- copy(*, deep=False)[source]
Return a copy of self.
- Parameters
deep (
bool
) – Whether to return a deep copy or not. If True, this also copies theadata
.- Return type
- Returns
A copy of self.
Kernel
- class cellrank.tl.kernels.Kernel(adata, backward=False, compute_cond_num=False, check_connectivity=False, **kwargs)[source]
A base class from which all kernels are derived.
These kernels read from a given AnnData object, usually the KNN graph and additional variables, to compute a weighted, directed graph. Every kernel object has a direction. The kernels defined in the derived classes are not strictly kernels in the mathematical sense because they often only take one input argument - however, they build on other functions which have computed a similarity based on two input arguments. The role of the kernels defined here is to add directionality to these symmetric similarity relations or to transform them.
- Parameters
adata (
anndata.AnnData
) – Annotated data object.backward (
bool
) – Direction of the process.compute_cond_num (
bool
) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.check_connectivity (
bool
) – Check whether the underlying KNN graph is connected.kwargs (
Any
) – Keyword arguments which can specify key to be read fromadata
object.
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- Returns
anndata.AnnData
Annotated data object.
- compute_projection(basis='umap', key_added=None, copy=False)
Compute a projection of the transition matrix in the embedding.
Projections can only be calculated for kNN based kernels. The projected matrix can be then visualized as:
scvelo.pl.velocity_embedding(adata, vkey='T_fwd', basis='umap')
- Parameters
basis (
str
) – Basis inanndata.AnnData.obsm
for which to compute the projection.key_added (
Optional
[str
]) – If not None andcopy = False
, save the result toanndata.AnnData.obsm
['{key_added}']
. Otherwise, save the result to ‘T_fwd_{basis}’ or T_bwd_{basis}, depending on the direction.copy (
bool
) – Whether to return the projection or modifyadata
inplace.
- Return type
- Returns
If
copy=True
, the projection array of shape (n_cells, n_components). Otherwise, it modifiesanndata.AnnData.obsm
with a key based onkey_added
.
- abstract compute_transition_matrix(*args, **kwargs)
Compute a transition matrix.
- abstract copy()
Return a copy of itself. Note that the underlying
adata
object is not copied.- Return type
KernelExpression
- property kernels: List[cellrank.tl.kernels._base_kernel.Kernel]
Get the kernels of the kernel expression, except for constants.
- plot_random_walks(n_sims, max_iter=0.25, seed=None, successive_hits=0, start_ixs=None, stop_ixs=None, basis='umap', cmap='gnuplot', linewidth=1.0, linealpha=0.3, ixs_legend_loc=None, n_jobs=None, backend='loky', show_progress_bar=True, figsize=None, dpi=None, save=None, **kwargs)
Plot random walks in an embedding.
This method simulates random walks on the Markov chain defined though the corresponding transition matrix. The method is intended to give qualitative rather than quantitative insights into the transition matrix. Random walks are simulated by iteratively choosing the next cell based on the current cell’s transition probabilities.
- Parameters
n_sims (
int
) – Number of random walks to simulate.max_iter (
Union
[int
,float
]) – Maximum number of steps of a random walk. If afloat
, it can be specified as a fraction of the number of cells.successive_hits (
int
) – Number of successive hits in thestop_ixs
required to stop prematurely.start_ixs (
Union
[Sequence
[str
],Dict
[str
,Union
[str
,Sequence
[str
],Tuple
[float
,float
]]],None
]) –Cells from which to sample the starting points. If None, use all cells. Can be specified as:
dict
- dictionary with 1 key inanndata.AnnData.obs
with values corresponding to either 1 or more clusters (if the column is categorical) or atuple
specifying [min, max] interval from which to select the indices.typing.Sequence
- sequence of cell ids inanndata.AnnData.obs_names
.
For example
{'dpt_pseudotime': [0, 0.1]}
means that starting points for random walks will be sampled uniformly from cells whose pseudotime is in [0, 0.1].stop_ixs (
Union
[Sequence
[str
],Dict
[str
,Union
[str
,Sequence
[str
],Tuple
[float
,float
]]],None
]) –Cells which when hit, the random walk is terminated. If None, terminate after
max_iters
. Can be specified as:dict
- dictionary with 1 key inanndata.AnnData.obs
with values corresponding to either 1 or more clusters (if the column is categorical) or atuple
specifying [min, max] interval from which to select the indices.typing.Sequence
- sequence of cell ids inanndata.AnnData.obs_names
.
For example
{'clusters': ['Alpha', 'Beta']}
andsuccessive_hits = 3
means that the random walk will stop prematurely after cells in the above specified clusters have been visited successively 3 times in a row.basis (
str
) – Basis inanndata.AnnData.obsm
to use as an embedding.cmap (
Union
[str
,LinearSegmentedColormap
]) – Colormap for the random walk lines.linewidth (
float
) – Width of the random walk lines.linealpha (
float
) – Alpha value of the random walk lines.ixs_legend_loc (
Optional
[str
]) – Legend location for the start/top indices.show_progress_bar (
bool
) – Whether to show a progress bar. Disabling it may slightly improve performance.n_jobs (
Optional
[int
]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.backend (
str
) – Which backend to use for parallelization. Seejoblib.Parallel
for valid options.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
. For each random walk, the first/last cell is marked by the start/end colors ofcmap
.
- plot_single_flow(cluster, cluster_key, time_key, clusters=None, time_points=None, min_flow=0, remove_empty_clusters=True, ascending=False, legend_loc='upper right out', alpha=0.8, xticks_step_size=1, figsize=None, dpi=None, save=None, show=True)
Visualize outgoing flow from a cluster of cells [Mittnenzweig et al., 2021].
- Parameters
cluster (
str
) – Cluster for which to visualize outgoing flow.cluster_key (
str
) – Key inanndata.AnnData.obs
where clustering is stored.time_key (
str
) – Key inanndata.AnnData.obs
where experimental time is stored.clusters (
Optional
[Sequence
[Any
]]) – Visualize flow only for these clusters. If None, use all clusters.time_points (
Optional
[Sequence
[Union
[float
,int
]]]) – Visualize flow only for these time points. If None, use all time points.min_flow (
float
) – Only show flow edges with flow greater than this value. Flow values are always in [0, 1].remove_empty_clusters (
bool
) – Whether to remove clusters with no incoming flow edges.ascending (
Optional
[bool
]) – Whether to sort the cluster by ascending or descending incoming flow. If None, use the order as in defined byclusters
.xticks_step_size (
Optional
[int
]) – Show only every n-th ticks on x-axis. If None, don’t show any ticks.legend_loc (
Optional
[str
]) – Position of the legend. If None, do not show the legend.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.show (
bool
) – If False, returnmatplotlib.pyplot.Axes
.
- Return type
- Returns
The axes object, if
show = False
. Nothing, just plots the figure. Optionally saves it based onsave
.
Notes
This function is a Python reimplementation of the following original R function with some minor stylistic differences. This function will not recreate the results from [Mittnenzweig et al., 2021], because there, the Metacell model [Baran et al., 2019] was used to compute the flow, whereas here the transition matrix is used.
- static read(fname, adata=None, copy=False)
Deserialize self from a file.
- Parameters
fname (
Union
[str
,Path
]) – Filename from which to read the object.adata (
Optional
[AnnData
]) –anndata.AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it or not. Ifadata
is a view, it is always copied.
- Return type
IOMixin
- Returns
The deserialized object.
- property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]
Return row-normalized transition matrix.
If not present, it is computed iff all underlying kernels have been initialized.
- write(fname, write_adata=True, ext='pickle')
Serialize self to a file.
- Parameters
- Return type
- Returns
Nothing, just writes itself to a file using
pickle
.
ExperimentalTime Kernel
- class cellrank.tl.kernels.ExperimentalTimeKernel(adata, backward=False, time_key='exp_time', compute_cond_num=False, **kwargs)[source]
Kernel base class which computes directed transition probabilities based on experimental time.
Optionally, we apply a density correction as described in [Coifman et al., 2005], where we use the implementation of [Haghverdi et al., 2016].
- Parameters
adata (
anndata.AnnData
) – Annotated data object.backward (
bool
) – Direction of the process.time_key (
str
) – Key inanndata.AnnData.obs
where experimental time is stored. The experimental time can be of either of a numeric or an ordered categorical type.compute_cond_num (
bool
) – Whether to compute condition number of the transition matrix. Note that this might be costly, since it does not use sparse implementation.
- plot_single_flow(cluster, cluster_key, time_key=None, *args, **kwargs)[source]
Visualize outgoing flow from a cluster of cells [Mittnenzweig et al., 2021].
- Parameters
cluster (
str
) – Cluster for which to visualize outgoing flow.cluster_key (
str
) – Key inanndata.AnnData.obs
where clustering is stored.time_key (
Optional
[str
]) – Key inanndata.AnnData.obs
where experimental time is stored.clusters – Visualize flow only for these clusters. If None, use all clusters.
time_points – Visualize flow only for these time points. If None, use all time points.
min_flow – Only show flow edges with flow greater than this value. Flow values are always in [0, 1].
remove_empty_clusters – Whether to remove clusters with no incoming flow edges.
ascending – Whether to sort the cluster by ascending or descending incoming flow. If None, use the order as in defined by
clusters
.alpha – Alpha value for cell proportions.
xticks_step_size – Show only every n-th ticks on x-axis. If None, don’t show any ticks.
legend_loc – Position of the legend. If None, do not show the legend.
figsize – Size of the figure.
dpi – Dots per inch.
save – Filename where to save the plot.
show – If False, return
matplotlib.pyplot.Axes
.
- Return type
- Returns
The axes object, if
show = False
. Nothing, just plots the figure. Optionally saves it based onsave
.
- property experimental_time: pandas.core.series.Series
Experimental time.
- Return type
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- Returns
anndata.AnnData
Annotated data object.
- compute_projection(basis='umap', key_added=None, copy=False)
Compute a projection of the transition matrix in the embedding.
Projections can only be calculated for kNN based kernels. The projected matrix can be then visualized as:
scvelo.pl.velocity_embedding(adata, vkey='T_fwd', basis='umap')
- Parameters
basis (
str
) – Basis inanndata.AnnData.obsm
for which to compute the projection.key_added (
Optional
[str
]) – If not None andcopy = False
, save the result toanndata.AnnData.obsm
['{key_added}']
. Otherwise, save the result to ‘T_fwd_{basis}’ or T_bwd_{basis}, depending on the direction.copy (
bool
) – Whether to return the projection or modifyadata
inplace.
- Return type
- Returns
If
copy=True
, the projection array of shape (n_cells, n_components). Otherwise, it modifiesanndata.AnnData.obsm
with a key based onkey_added
.
- abstract compute_transition_matrix(*args, **kwargs)
Compute a transition matrix.
- property kernels: List[cellrank.tl.kernels._base_kernel.Kernel]
Get the kernels of the kernel expression, except for constants.
- plot_random_walks(n_sims, max_iter=0.25, seed=None, successive_hits=0, start_ixs=None, stop_ixs=None, basis='umap', cmap='gnuplot', linewidth=1.0, linealpha=0.3, ixs_legend_loc=None, n_jobs=None, backend='loky', show_progress_bar=True, figsize=None, dpi=None, save=None, **kwargs)
Plot random walks in an embedding.
This method simulates random walks on the Markov chain defined though the corresponding transition matrix. The method is intended to give qualitative rather than quantitative insights into the transition matrix. Random walks are simulated by iteratively choosing the next cell based on the current cell’s transition probabilities.
- Parameters
n_sims (
int
) – Number of random walks to simulate.max_iter (
Union
[int
,float
]) – Maximum number of steps of a random walk. If afloat
, it can be specified as a fraction of the number of cells.successive_hits (
int
) – Number of successive hits in thestop_ixs
required to stop prematurely.start_ixs (
Union
[Sequence
[str
],Dict
[str
,Union
[str
,Sequence
[str
],Tuple
[float
,float
]]],None
]) –Cells from which to sample the starting points. If None, use all cells. Can be specified as:
dict
- dictionary with 1 key inanndata.AnnData.obs
with values corresponding to either 1 or more clusters (if the column is categorical) or atuple
specifying [min, max] interval from which to select the indices.typing.Sequence
- sequence of cell ids inanndata.AnnData.obs_names
.
For example
{'dpt_pseudotime': [0, 0.1]}
means that starting points for random walks will be sampled uniformly from cells whose pseudotime is in [0, 0.1].stop_ixs (
Union
[Sequence
[str
],Dict
[str
,Union
[str
,Sequence
[str
],Tuple
[float
,float
]]],None
]) –Cells which when hit, the random walk is terminated. If None, terminate after
max_iters
. Can be specified as:dict
- dictionary with 1 key inanndata.AnnData.obs
with values corresponding to either 1 or more clusters (if the column is categorical) or atuple
specifying [min, max] interval from which to select the indices.typing.Sequence
- sequence of cell ids inanndata.AnnData.obs_names
.
For example
{'clusters': ['Alpha', 'Beta']}
andsuccessive_hits = 3
means that the random walk will stop prematurely after cells in the above specified clusters have been visited successively 3 times in a row.basis (
str
) – Basis inanndata.AnnData.obsm
to use as an embedding.cmap (
Union
[str
,LinearSegmentedColormap
]) – Colormap for the random walk lines.linewidth (
float
) – Width of the random walk lines.linealpha (
float
) – Alpha value of the random walk lines.ixs_legend_loc (
Optional
[str
]) – Legend location for the start/top indices.show_progress_bar (
bool
) – Whether to show a progress bar. Disabling it may slightly improve performance.n_jobs (
Optional
[int
]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.backend (
str
) – Which backend to use for parallelization. Seejoblib.Parallel
for valid options.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
. For each random walk, the first/last cell is marked by the start/end colors ofcmap
.
- static read(fname, adata=None, copy=False)
Deserialize self from a file.
- Parameters
fname (
Union
[str
,Path
]) – Filename from which to read the object.adata (
Optional
[AnnData
]) –anndata.AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it or not. Ifadata
is a view, it is always copied.
- Return type
IOMixin
- Returns
The deserialized object.
- property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]
Return row-normalized transition matrix.
If not present, it is computed iff all underlying kernels have been initialized.
- write(fname, write_adata=True, ext='pickle')
Serialize self to a file.
- Parameters
- Return type
- Returns
Nothing, just writes itself to a file using
pickle
.
TransportMap Kernel
- class cellrank.tl.kernels.TransportMapKernel(*args, **kwargs)[source]
Kernel base class which computes transition matrix based on transport maps for consecutive time pairs.
- compute_transition_matrix(threshold='auto', last_time_point=LastTimePoint.DIAGONAL, conn_kwargs=mappingproxy({}), **kwargs)[source]
Compute transition matrix using transport maps.
- Parameters
threshold (
Union
[float
,Literal
[‘auto’],None
]) –How to remove small non-zero values from the transition matrix. Valid options are:
’auto’ - find the maximum threshold value which will not remove every non-zero value from any row.
float
- value in [0, 100] corresponding to a percentage of non-zeros to remove. Rows where all values are removed will have uniform distribution.None - do not threshold.
last_time_point (
LastTimePoint
) –How to define transitions within the last time point. Valid options are:
{ltp.UNIFORM!r} - row-normalized matrix of 1s for transitions within the last time point.
{ltp.DIAGONAL!r} - diagonal matrix with 1s on the diagonal.
{ltp.CONNECTIVITIES!r} - use transitions from
cellrank.tl.kernels.ConnectivityKernel
derived from the last time point subset ofadata
.
conn_kwargs (
Mapping
[str
,Any
]) – Keyword arguments forscanpy.pp.neighbors()
when usinglast_time_point = {ltp.CONNECTIVITIES!r}
. Can have ‘density_normalize’ forcellrank.tl.kernels.ConnectivityKernel.compute_transition_matrix()
.
- Return type
KernelExpression
- Returns
Self and updated
transition_matrix
.
- property transport_maps: Optional[Dict[Tuple[Any, Any], anndata._core.anndata.AnnData]]
Transport maps for consecutive time pairs.
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- Returns
anndata.AnnData
Annotated data object.
- compute_projection(basis='umap', key_added=None, copy=False)
Compute a projection of the transition matrix in the embedding.
Projections can only be calculated for kNN based kernels. The projected matrix can be then visualized as:
scvelo.pl.velocity_embedding(adata, vkey='T_fwd', basis='umap')
- Parameters
basis (
str
) – Basis inanndata.AnnData.obsm
for which to compute the projection.key_added (
Optional
[str
]) – If not None andcopy = False
, save the result toanndata.AnnData.obsm
['{key_added}']
. Otherwise, save the result to ‘T_fwd_{basis}’ or T_bwd_{basis}, depending on the direction.copy (
bool
) – Whether to return the projection or modifyadata
inplace.
- Return type
- Returns
If
copy=True
, the projection array of shape (n_cells, n_components). Otherwise, it modifiesanndata.AnnData.obsm
with a key based onkey_added
.
- copy()
Return a copy of self.
- Return type
- property experimental_time: pandas.core.series.Series
Experimental time.
- Return type
- property kernels: List[cellrank.tl.kernels._base_kernel.Kernel]
Get the kernels of the kernel expression, except for constants.
- plot_random_walks(n_sims, max_iter=0.25, seed=None, successive_hits=0, start_ixs=None, stop_ixs=None, basis='umap', cmap='gnuplot', linewidth=1.0, linealpha=0.3, ixs_legend_loc=None, n_jobs=None, backend='loky', show_progress_bar=True, figsize=None, dpi=None, save=None, **kwargs)
Plot random walks in an embedding.
This method simulates random walks on the Markov chain defined though the corresponding transition matrix. The method is intended to give qualitative rather than quantitative insights into the transition matrix. Random walks are simulated by iteratively choosing the next cell based on the current cell’s transition probabilities.
- Parameters
n_sims (
int
) – Number of random walks to simulate.max_iter (
Union
[int
,float
]) – Maximum number of steps of a random walk. If afloat
, it can be specified as a fraction of the number of cells.successive_hits (
int
) – Number of successive hits in thestop_ixs
required to stop prematurely.start_ixs (
Union
[Sequence
[str
],Dict
[str
,Union
[str
,Sequence
[str
],Tuple
[float
,float
]]],None
]) –Cells from which to sample the starting points. If None, use all cells. Can be specified as:
dict
- dictionary with 1 key inanndata.AnnData.obs
with values corresponding to either 1 or more clusters (if the column is categorical) or atuple
specifying [min, max] interval from which to select the indices.typing.Sequence
- sequence of cell ids inanndata.AnnData.obs_names
.
For example
{'dpt_pseudotime': [0, 0.1]}
means that starting points for random walks will be sampled uniformly from cells whose pseudotime is in [0, 0.1].stop_ixs (
Union
[Sequence
[str
],Dict
[str
,Union
[str
,Sequence
[str
],Tuple
[float
,float
]]],None
]) –Cells which when hit, the random walk is terminated. If None, terminate after
max_iters
. Can be specified as:dict
- dictionary with 1 key inanndata.AnnData.obs
with values corresponding to either 1 or more clusters (if the column is categorical) or atuple
specifying [min, max] interval from which to select the indices.typing.Sequence
- sequence of cell ids inanndata.AnnData.obs_names
.
For example
{'clusters': ['Alpha', 'Beta']}
andsuccessive_hits = 3
means that the random walk will stop prematurely after cells in the above specified clusters have been visited successively 3 times in a row.basis (
str
) – Basis inanndata.AnnData.obsm
to use as an embedding.cmap (
Union
[str
,LinearSegmentedColormap
]) – Colormap for the random walk lines.linewidth (
float
) – Width of the random walk lines.linealpha (
float
) – Alpha value of the random walk lines.ixs_legend_loc (
Optional
[str
]) – Legend location for the start/top indices.show_progress_bar (
bool
) – Whether to show a progress bar. Disabling it may slightly improve performance.n_jobs (
Optional
[int
]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.backend (
str
) – Which backend to use for parallelization. Seejoblib.Parallel
for valid options.figsize (
Optional
[Tuple
[float
,float
]]) – Size of the figure.save (
Union
[str
,Path
,None
]) – Filename where to save the plot.kwargs (
Any
) – Keyword arguments forscvelo.pl.scatter()
.
- Return type
- Returns
Nothing, just plots the figure. Optionally saves it based on
save
. For each random walk, the first/last cell is marked by the start/end colors ofcmap
.
- plot_single_flow(cluster, cluster_key, time_key=None, *args, **kwargs)
Visualize outgoing flow from a cluster of cells [Mittnenzweig et al., 2021].
- Parameters
cluster (
str
) – Cluster for which to visualize outgoing flow.cluster_key (
str
) – Key inanndata.AnnData.obs
where clustering is stored.time_key (
Optional
[str
]) – Key inanndata.AnnData.obs
where experimental time is stored.clusters – Visualize flow only for these clusters. If None, use all clusters.
time_points – Visualize flow only for these time points. If None, use all time points.
min_flow – Only show flow edges with flow greater than this value. Flow values are always in [0, 1].
remove_empty_clusters – Whether to remove clusters with no incoming flow edges.
ascending – Whether to sort the cluster by ascending or descending incoming flow. If None, use the order as in defined by
clusters
.alpha – Alpha value for cell proportions.
xticks_step_size – Show only every n-th ticks on x-axis. If None, don’t show any ticks.
legend_loc – Position of the legend. If None, do not show the legend.
figsize – Size of the figure.
dpi – Dots per inch.
save – Filename where to save the plot.
show – If False, return
matplotlib.pyplot.Axes
.
- Return type
- Returns
The axes object, if
show = False
. Nothing, just plots the figure. Optionally saves it based onsave
.
- static read(fname, adata=None, copy=False)
Deserialize self from a file.
- Parameters
fname (
Union
[str
,Path
]) – Filename from which to read the object.adata (
Optional
[AnnData
]) –anndata.AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it or not. Ifadata
is a view, it is always copied.
- Return type
IOMixin
- Returns
The deserialized object.
- property transition_matrix: Union[numpy.ndarray, scipy.sparse.base.spmatrix]
Return row-normalized transition matrix.
If not present, it is computed iff all underlying kernels have been initialized.
- write(fname, write_adata=True, ext='pickle')
Serialize self to a file.
- Parameters
- Return type
- Returns
Nothing, just writes itself to a file using
pickle
.
Similarity Scheme
- class cellrank.tl.kernels.SimilaritySchemeABC[source]
Base class for all similarity schemes.
- abstract __call__(v, D, softmax_scale=1.0)[source]
Compute transition probability of a cell to its nearest neighbors using RNA velocity.
- Parameters
v (
ndarray
) – Array of shape(n_genes,)
or(n_neighbors, n_genes)
containing the velocity vector(s). The second case is used for the backward process.D (
ndarray
) – Array of shape(n_neighbors, n_genes)
corresponding to the transcriptomic displacement of the current cell with respect to ist nearest neighbors.softmax_scale (
float
) – Scaling factor for the softmax function.
- Return type
- Returns
The probability and logits arrays of shape
(n_neighbors,)
.
Threshold Scheme
- class cellrank.tl.kernels.ThresholdSchemeABC[source]
Base class for all connectivity biasing schemes.
- abstract __call__(cell_pseudotime, neigh_pseudotime, neigh_conn, **kwargs)[source]
Calculate biased connections for a given cell.
- Parameters
- Return type
- Returns
Array of shape
(n_neighbors,)
containing the biased connectivities.
- bias_knn(conn, pseudotime, n_jobs=None, backend='loky', show_progress_bar=True, **kwargs)[source]
Bias cell-cell connectivities of a KNN graph.
- Parameters
conn (
csr_matrix
) – Sparse matrix of shape(n_cells, n_cells)
containing the nearest neighbor connectivities.pseudotime (
ndarray
) – Pseudotemporal ordering of cells.show_progress_bar (
bool
) – Whether to show a progress bar. Disabling it may slightly improve performance.n_jobs (
Optional
[int
]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.backend (
str
) – Which backend to use for parallelization. Seejoblib.Parallel
for valid options.
- Return type
- Returns
The biased connectivities.
BaseModel
- class cellrank.ul.models.BaseModel(adata, model)[source]
Base class for all model classes.
- Parameters
adata (
anndata.AnnData
) – Annotated data object.model (
Any
) – The underlying model that is used for fitting and prediction.
- property prepared
Whether the model is prepared for fitting.
- property adata: anndata._core.anndata.AnnData
Annotated data object.
- Return type
- Returns
adata :
anndata.AnnData
Annotated data object.
- property x_all: numpy.ndarray
Unfiltered independent variables of shape (n_cells, 1).
- Return type
- property y_all: numpy.ndarray
Unfiltered dependent variables of shape (n_cells, 1).
- Return type
- property w_all: numpy.ndarray
Unfiltered weights of shape (n_cells,).
- Return type
- property x: numpy.ndarray
Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.
- Return type
- property y: numpy.ndarray
Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.
- Return type
- property w: numpy.ndarray
Filtered weights of shape (n_filtered_cells,) used for fitting.
- Return type
- property x_test: numpy.ndarray
Independent variables of shape (n_samples, 1) used for prediction.
- Return type
- property y_test: numpy.ndarray
Prediction values of shape (n_samples,) for
x_test
.- Return type
- property x_hat: numpy.ndarray
Filtered independent variables used when calculating default confidence interval, usually same as
x
.- Return type
- property y_hat: numpy.ndarray
Filtered dependent variables used when calculating default confidence interval, usually same as
y
.- Return type
- property conf_int: numpy.ndarray
Array of shape (n_samples, 2) containing the lower and upper bounds of the confidence interval.
- Return type
- prepare(gene, lineage, backward=False, time_range=None, data_key='X', time_key='latent_time', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)[source]
Prepare the model to be ready for fitting.
- Parameters
gene (
str
) – Gene inanndata.AnnData.var_names
.lineage (
Optional
[str
]) – Name of a lineage inanndata.AnnData.obsm
['{lineage_key}']
. If None, all weights will be set to 1.backward (
bool
) – Direction of the process.time_range (
Union
[float
,Tuple
[float
,float
],None
]) –Specify start and end times:
data_key (
Optional
[str
]) – Key inanndata.AnnData.layers
or ‘X’ foranndata.AnnData.X
. Ifuse_raw = True
, it’s always set to ‘X’.time_key (
str
) – Key inanndata.AnnData.obs
where the pseudotime is stored.use_raw (
bool
) – Whether to accessanndata.AnnData.raw
.threshold (
Optional
[float
]) – Consider only cells with weights >threshold
when estimating the test endpoint. If None, use the median of the weights.weight_threshold (
Union
[float
,Tuple
[float
,float
]]) – Set all weights belowweight_threshold
toweight_threshold
if afloat
, or to the second value, if atuple
.filter_cells (
Optional
[float
]) – Filter out all cells with expression values lower than this threshold.n_test_points (
int
) – Number of test points. If None, use the original points based onthreshold
.
- Return type
- Returns
Nothing, just updates the following fields:
x
- Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.y
- Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.w
- Filtered weights of shape (n_filtered_cells,) used for fitting.x_all
- Unfiltered independent variables of shape (n_cells, 1).y_all
- Unfiltered dependent variables of shape (n_cells, 1).w_all
- Unfiltered weights of shape (n_cells,).x_test
- Independent variables of shape (n_samples, 1) used for prediction.prepared
- Whether the model is prepared for fitting.
- abstract fit(x=None, y=None, w=None, **kwargs)[source]
Fit the model.
- Parameters
x (
Optional
[ndarray
]) – Independent variables, array of shape (n_samples, 1). If None, usex
.y (
Optional
[ndarray
]) – Dependent variables, array of shape (n_samples, 1). If None, usey
.w (
Optional
[ndarray
]) – Optional weights ofx
, array of shape (n_samples,). If None, usew
.kwargs – Keyword arguments for underlying
model
’s fitting function.
- Return type
- Returns
Fits the
model
and returns self.