cellrank.estimators.GPCCA¶
- class cellrank.estimators.GPCCA(object, **kwargs)[source]¶
Generalized Perron Cluster Cluster Analysis (GPCCA) [Reuter et al., 2019, Reuter et al., 2018].
See also
See Computing Initial and Terminal States on how to compute the
initialandterminalstates.See Estimating Fate Probabilities and Driver Genes on how to compute the
fate_probabilitiesandlineage_drivers.
This is our main and recommended estimator implemented in pyGPCCA . Use it to compute macrostates, automatically and semi-automatically classify these as initial, intermediate and terminal states, compute fate probabilities towards macrostates, uncover driver genes, and much more. To compute and classify macrostates, we run the GPCCA algorithm under the hood, which returns a soft assignment of cells to macrostates, as well as a coarse-grained transition matrix among the set of macrostates [Reuter et al., 2019, Reuter et al., 2018]. This estimator allows you to inject prior knowledge where available to guide the identification of initial, intermediate and terminal states.
- Parameters:
object (
str|bool|ndarray|spmatrix|AnnData|KernelExpression) –Can be one of the following types:
AnnData- annotated data object.KernelExpression- kernel expression.str- key inobspwhere the transition matrix is stored andadatamust be provided in this case.bool- directionality of the transition matrix that will be used to infer its storage location. IfNone, the directionality will be determined automatically andadatamust be provided in this case.
kwargs (
Any) – Keyword arguments for thePrecomputedKernel.
Attributes table¶
Mean and variance of the time until absorption. |
|
Annotated data object. |
|
Direction of the |
|
Coarse-grained transition matrix. |
|
Coarse-grained initial distribution. |
|
Coarse-grained stationary distribution. |
|
Eigendecomposition of the |
|
Fate probabilities. |
|
Categorical annotation of initial states. |
|
Initial states memberships. |
|
Probability to be an initial state. |
|
Underlying kernel expression. |
|
Potential lineage drivers. |
|
Macrostates of the transition matrix. |
|
Macrostate memberships. |
|
Estimator parameters. |
|
Priming degree. |
|
Schur matrix. |
|
Real Schur vectors of the transition matrix. |
|
Shape of the kernel. |
|
Categorical annotation of terminal states. |
|
Terminal states memberships. |
|
Probability to be a terminal state. |
|
Transition matrix of the |
Methods table¶
|
Compute the mean time to absorption and optionally its variance. |
|
Compute eigendecomposition of the |
|
Compute fate probabilities. |
|
Compute driver genes per lineage. |
|
Compute the degree of lineage priming. |
|
Compute the macrostates. |
|
Compute the Schur decomposition. |
|
Return a copy of self. |
|
Prepare self for terminal states prediction. |
|
De-serialize self from |
|
Plot the coarse-grained transition matrix. |
|
Plot fate probabilities. |
|
Plot lineage drivers. |
|
Show scatter plot of gene-correlations between two lineages. |
|
Plot histogram of macrostates over categorical annotations. |
|
Plot macrostates on an embedding or along pseudotime. |
|
Plot the Schur matrix. |
|
Plot the top eigenvalues in a real or a complex plane. |
|
Plot terminal state identificiation (TSI). |
|
Alias for |
|
Compute initial states from macrostates using |
|
Automatically select terminal states from macrostates. |
|
De-serialize self from a file. |
|
Rename the |
|
Rename the |
|
Set the |
|
Set the |
|
Serialize self to |
|
Compute terminal state identification (TSI) score. |
|
Serialize self to a file using |
Attributes¶
absorption_times¶
- GPCCA.absorption_times¶
Mean and variance of the time until absorption.
Related to conditional mean first passage times. Corresponds to the expectation of the time until absorption, depending on initialization, and the variance.
adata¶
- GPCCA.adata¶
Annotated data object.
backward¶
coarse_T¶
- GPCCA.coarse_T¶
Coarse-grained transition matrix.
coarse_initial_distribution¶
- GPCCA.coarse_initial_distribution¶
Coarse-grained initial distribution.
coarse_stationary_distribution¶
- GPCCA.coarse_stationary_distribution¶
Coarse-grained stationary distribution.
eigendecomposition¶
- GPCCA.eigendecomposition¶
Eigendecomposition of the
transition_matrix.For non-symmetric real matrices, left and right eigenvectors will in general be different and complex. We compute both left and right eigenvectors.
- Returns:
A dictionary with the following keys:
'D'- the eigenvalues.'eigengap'- the eigengap.'params'- parameters used for the computation.'V_l'- left eigenvectors (optional).'V_r'- right eigenvectors (optional).'stationary_dist'- stationary distribution of thetransition_matrix, if present.
fate_probabilities¶
- GPCCA.fate_probabilities¶
Fate probabilities.
Informally, given a (finite, discrete) Markov chain with a set of transient states \(T\) and a set of absorbing states \(A\), the absorption probability for cell \(i\) from \(T\) to reach cell \(j\) from \(R\) is the probability that a random walk initialized in \(i\) will reach absorbing state \(j\).
In our context, states correspond to cells, in particular, absorbing states correspond to cells in
terminal_states.
initial_states¶
- GPCCA.initial_states¶
Categorical annotation of initial states.
By default, all transient cells will be labeled as NaN.
initial_states_memberships¶
- GPCCA.initial_states_memberships¶
Initial states memberships.
Soft assignment of cells to initial states.
initial_states_probabilities¶
- GPCCA.initial_states_probabilities¶
Probability to be an initial state.
kernel¶
- GPCCA.kernel¶
Underlying kernel expression.
lineage_drivers¶
- GPCCA.lineage_drivers¶
Potential lineage drivers.
Computes Pearson correlation of each gene with fate probabilities for every terminal state. High Pearson correlation indicates potential lineage drivers. Also computes p-values and confidence intervals.
- Returns:
Dataframe of shape
(n_genes, n_lineages * 5)containing the following columns, one for each lineage:
macrostates¶
- GPCCA.macrostates¶
Macrostates of the transition matrix.
macrostates_memberships¶
- GPCCA.macrostates_memberships¶
Macrostate memberships.
Soft assignment of microstates (cells) to macrostates.
params¶
- GPCCA.params¶
Estimator parameters.
priming_degree¶
- GPCCA.priming_degree¶
Priming degree.
Given a cell \(i\) and a set of terminal states, this quantifies how committed vs. naive cell \(i\) is, i.e. its degree of pluripotency. Low values correspond to naive cells (high degree of pluripotency), high values correspond to committed cells (low degree of pluripotency).
schur_matrix¶
- GPCCA.schur_matrix¶
Schur matrix.
The real Schur decomposition is a generalization of the Eigendecomposition and can be computed for any real-valued, square matrix \(A\). It is given by \(A = Q R Q^T\), where \(Q\) contains the real Schur vectors and \(R\) is the Schur matrix. \(Q\) is orthogonal and \(R\) is quasi-upper triangular with 1x1 and 2x2 blocks on the diagonal.
If PETSc and SLEPc are installed, only the leading Schur vectors are computed.
schur_vectors¶
- GPCCA.schur_vectors¶
Real Schur vectors of the transition matrix.
The real Schur decomposition is a generalization of the Eigendecomposition and can be computed for any real-valued, square matrix \(A\). It is given by \(A = Q R Q^T\), where \(Q\) contains the real Schur vectors and \(R\) is the Schur matrix. \(Q\) is orthogonal and \(R\) is quasi-upper triangular with 1x1 and 2x2 blocks on the diagonal.
If PETSc and SLEPc are installed, only the leading Schur vectors are computed.
shape¶
- GPCCA.shape¶
Shape of the kernel.
terminal_states¶
- GPCCA.terminal_states¶
Categorical annotation of terminal states.
By default, all transient cells will be labeled as NaN.
terminal_states_memberships¶
- GPCCA.terminal_states_memberships¶
Terminal states memberships.
Soft assignment of cells to terminal states.
terminal_states_probabilities¶
- GPCCA.terminal_states_probabilities¶
Probability to be a terminal state.
transition_matrix¶
Methods¶
compute_absorption_times¶
- GPCCA.compute_absorption_times(keys=None, calculate_variance=False, solver='gmres', use_petsc=True, n_jobs=None, backend='loky', show_progress_bar=None, tol=1e-06, preconditioner=None)¶
Compute the mean time to absorption and optionally its variance.
- Parameters:
keys (
Sequence[str] |None) – Terminal states for which to compute the fate probabilities. IfNone, use all states defined interminal_states.calculate_variance (
bool) – Whether to calculate the variance.solver (
Union[str,Literal['direct','gmres','lgmres','bicgstab','gcrotmk']]) –Solver to use for the linear problem. Options are
'direct','gmres','lgmres','bicgstab'or'gcrotmk'whenuse_petsc = False.Information on the
scipyiterative solvers can be found inscipy.sparse.linalgor for thepetscsolvers here.use_petsc (
bool) – Whether to use solvers frompetsc4pyorscipy. Recommended for large problems. If no installation is found, defaults togmres().n_jobs (
int|None) – Number of parallel jobs to use when using an iterative solver.backend (
Literal['loky','multiprocessing','threading']) – Which backend to use for multiprocessing. SeeParallelfor valid options.show_progress_bar (
bool|None) – Whether to show progress bar. Only used whensolver != 'direct'.tol (
float) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.preconditioner (
str|None) – Preconditioner to use, only available whenuse_petsc = True. For valid options, see here. We recommend the'ilu'preconditioner for badly conditioned problems.check_sum_tol – Tolerance for checking whether fate probabilities sum to 1. Fate probabilities are computed by solving a linear system; this tolerance is used to verify the solution is valid. Increase the argument value if the solver converges but the check fails due to numerical precision.
self (FateProbsProtocol)
- Return type:
- Returns:
: Nothing, just updates the following fields:
absorption_times- Mean and variance of the time until absorption.
compute_eigendecomposition¶
- GPCCA.compute_eigendecomposition(k=20, which='LR', alpha=1.0, only_evals=False, ncv=None)¶
Compute eigendecomposition of the
transition_matrix.Uses a sparse implementation, if possible, and only computes the top \(k\) eigenvectors to speed up the computation. Computes both left and right eigenvectors.
- Parameters:
k (
int) – Number of eigenvectors or eigenvalues to compute.which (
Literal['LR','LM']) –How to sort the eigenvalues. Valid option are:
'LR'- the largest real part.'LM'- the largest magnitude.
alpha (
float) – Used to compute the eigengap.alphais the weight given to the deviation of an eigenvalue from one.only_evals (
bool) – Whether to compute only eigenvalues.self (EigenProtocol)
- Return type:
EigenMixin- Returns:
: Self and updates the following fields:
eigendecomposition- Eigendecomposition of thetransition_matrix.
compute_fate_probabilities¶
- GPCCA.compute_fate_probabilities(keys=None, solver='gmres', use_petsc=True, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-06, preconditioner=None, check_sum_tol=0.001)¶
Compute fate probabilities.
For each cell, this computes the probability of being absorbed in any of the
terminal_states. In particular, this corresponds to the probability that a random walk initialized in transient cell \(i\) will reach any cell from a fixed transient state before reaching a cell from any other transient state.- Parameters:
keys (
Sequence[str] |None) – Terminal states for which to compute the fate probabilities. IfNone, use all states defined interminal_states.solver (
Union[str,Literal['direct','gmres','lgmres','bicgstab','gcrotmk']]) –Solver to use for the linear problem. Options are
'direct','gmres','lgmres','bicgstab'or'gcrotmk'whenuse_petsc = False.Information on the
scipyiterative solvers can be found inscipy.sparse.linalgor for thepetscsolvers here.use_petsc (
bool) – Whether to use solvers frompetsc4pyorscipy. Recommended for large problems. If no installation is found, defaults togmres().n_jobs (
int|None) – Number of parallel jobs to use when using an iterative solver.backend (
Literal['loky','multiprocessing','threading']) – Which backend to use for multiprocessing. SeeParallelfor valid options.show_progress_bar (
bool) – Whether to show progress bar. Only used whensolver != 'direct'.tol (
float) – Convergence tolerance for the iterative solver. The default is fine for most cases, only consider decreasing this for severely ill-conditioned matrices.preconditioner (
str|None) – Preconditioner to use, only available whenuse_petsc = True. For valid options, see here. We recommend the'ilu'preconditioner for badly conditioned problems.check_sum_tol (
float) – Tolerance for checking whether fate probabilities sum to 1. Fate probabilities are computed by solving a linear system; this tolerance is used to verify the solution is valid. Increase the argument value if the solver converges but the check fails due to numerical precision.self (FateProbsProtocol)
- Return type:
- Returns:
: Nothing, just updates the following fields:
fate_probabilities- Fate probabilities.
compute_lineage_drivers¶
- GPCCA.compute_lineage_drivers(lineages=None, method='fisher', cluster_key=None, clusters=None, layer=None, use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, nan_policy='propagate', **kwargs)¶
Compute driver genes per lineage.
Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.
- Parameters:
lineages (
str|Sequence|None) – Lineage names fromfate_probabilities. IfNone, use all lineages.method (
Literal['fisher','perm_test']) –Mode to use when calculating p-values and confidence intervals. Valid options are:
'fisher'- Fisher transformation [Fisher, 1921].'perm_test'- permutation test.
cluster_key (
str|None) – Key inobsto obtain cluster annotations. These are considered forclusters.clusters (
str|Sequence|None) – Restrict the correlations to these clusters.layer (
str|None) – Key fromlayersfrom which to get the expression. IfNoneor ‘X’, useX.use_raw (
bool) – Whether to userawto correlate gene expression.confidence_level (
float) – Confidence level for the confidence interval calculation. Must be in interval \([0, 1]\).n_perms (
int) – Number of permutations to use whenmethod = 'perm_test'.nan_policy (
Literal['propagate','omit']) –How to handle missing values (
nan) in the expression data. Valid options are:'propagate'- missing values propagate to the result.'omit'- correlate each gene and lineage only over the cells where both are non-missing, akin toscipy.stats.pearsonr(). Only supported for dense expression data andmethod = 'fisher'.
show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.
n_jobs – Number of parallel jobs. If -1, use all available cores. If
Noneor 1, the execution is sequential.backend – Which backend to use for parallelization. See
Parallelfor valid options.kwargs (
Any) – Keyword for the correlation test.self (LinDriversProtocol)
- Return type:
- Returns:
: Dataframe of shape
(n_genes, n_lineages * 5)containing the following columns, one for each lineage: Also updates the following field:lineage_drivers- the samepandas.DataFrameas described above.
compute_lineage_priming¶
- GPCCA.compute_lineage_priming(method='kl_divergence', early_cells=None)¶
Compute the degree of lineage priming.
It returns a score in \([0, 1]\) where \(0\) stands for naive and \(1\) stands for committed.
- Parameters:
method (
Literal['kl_divergence','entropy']) –The method used to compute the degree of lineage priming. Valid options are:
'kl_divergence'- as in [Velten et al., 2017], computes KL-divergence between the fate probabilities of a cell and the average fate probabilities. Computation of average fate probabilities can be restricted to a set of user-definedearly_cells.'entropy'- as in [Setty et al., 2019], computes entropy over a cell’s fate probabilities.
early_cells (
Mapping[str,Sequence[str]] |Sequence[str] |None) – Cell IDs or a mask marking early cells. IfNone, use all cells. Only used whenmethod = 'kl_divergence'. If adict, the key specifies a cluster key inobsand the values specify cluster labels containing early cells.self (FateProbsProtocol)
- Return type:
- Returns:
: Returns the priming degree and updates the following fields:
priming_degree- Priming degree.
compute_macrostates¶
- GPCCA.compute_macrostates(n_states=None, n_cells=30, cluster_key=None, weight_key=None, **kwargs)[source]¶
Compute the macrostates.
- Parameters:
n_states (
int|Sequence[int] |None) – Number of macrostates to compute. If aSequence, use the minChi criterion [Reuter et al., 2018]. IfNone, use the eigengap heuristic.n_cells (
int|None) – Number of most likely cells from each macrostate to select.cluster_key (
str|Mapping[str,str] |None) –If given, names and colors of the states will be associated with these reference annotations. Either:
a
str(or{"obs": <column>}), a categorical column inobs. Each macrostate is named after the dominant category among its most-likely observations (unchanged behavior).{"obsm": <key>}, whereadata.obsm[key]is aDataFramewhose columns are the categories and whose rows are per-observation proportions summing to \(1\) (e.g. cell-type or condition fractions of aggregated samples). For each macrostate, the proportion rows of its most-likely observations are summed (seeweight_keyfor the weighting) and the macrostate is named after the category with the largest resulting total.
Only used when
cluster_keypoints toobsm. Controls how each observation’s proportion row is weighted when summing proportions to pick a macrostate’s dominant category:None(default) - every observation contributes equally, with weight \(1\), irrespective of any per-observation quantity.a key in
obs- each observation’s proportion row is scaled byadata.obs[weight_key]before summing, so observations with larger weights count proportionally more. The weights can be any per-observation quantity; a common choice is the number of cells per aggregated sample, which makes naming reflect cell-level rather than sample-level dominance.
kwargs (
Any) – Keyword arguments forcompute_schur().
- Return type:
- Returns:
: Returns self and updates the following fields:
macrostates- Macrostates of the transition matrix.macrostates_memberships- Macrostate memberships.coarse_T- Coarse-grained transition matrix.coarse_initial_distribution- Coarse-grained initial distribution.coarse_stationary_distribution- Coarse-grained stationary distribution.schur_vectors- Real Schur vectors of the transition matrix.schur_matrix- Schur matrix.eigendecomposition- Eigendecomposition of thetransition_matrix.
compute_schur¶
- GPCCA.compute_schur(n_components=20, initial_distribution=None, method='krylov', which='LR', alpha=1.0, verbose=None)¶
Compute the Schur decomposition.
- Parameters:
n_components (
int) – Number of Schur vectors to compute.initial_distribution (
ndarray|None) – Input distribution over all cells. IfNone, uniform distribution is used.method (
Literal['krylov','brandts']) –Method for calculating the Schur vectors. Valid options are:
'krylov'- an iterative procedure that computes a partial, sorted Schur decomposition for large, sparse matrices.'brandts'- full sorted Schur decomposition of a dense matrix.
For benefits of each method, see
GPCCA.which (
Literal['LR','LM']) –How to sort the eigenvalues. Valid option are:
'LR'- the largest real part.'LM'- the largest magnitude.
alpha (
float) – Used to compute the eigengap.alphais the weight given to the deviation of an eigenvalue from one.verbose (
bool|None) – Whether to print extra information when computing the Schur decomposition. IfNone, it’s disabled whenmethod = 'krylov'.self (SchurProtocol)
- Return type:
SchurMixin- Returns:
: Self and just updates the following fields:
schur_vectors- Real Schur vectors of the transition matrix.schur_matrix- Schur matrix.eigendecomposition- Eigendecomposition of thetransition_matrix.
copy¶
fit¶
- GPCCA.fit(n_states=None, n_cells=30, cluster_key=None, weight_key=None, **kwargs)[source]¶
Prepare self for terminal states prediction.
Deprecated since version 2.1: Will be removed in CellRank 3.0. Use
compute_schur()andcompute_macrostates()directly.- Parameters:
n_states (
int|Sequence[int] |None) – Number of macrostates to compute. If aSequence, use the minChi criterion [Reuter et al., 2018]. IfNone, use the eigengap heuristic.n_cells (
int|None) – Number of most likely cells from each macrostate to select.cluster_key (
str|Mapping[str,str] |None) –If given, names and colors of the states will be associated with these reference annotations. Either:
a
str(or{"obs": <column>}), a categorical column inobs. Each macrostate is named after the dominant category among its most-likely observations (unchanged behavior).{"obsm": <key>}, whereadata.obsm[key]is aDataFramewhose columns are the categories and whose rows are per-observation proportions summing to \(1\) (e.g. cell-type or condition fractions of aggregated samples). For each macrostate, the proportion rows of its most-likely observations are summed (seeweight_keyfor the weighting) and the macrostate is named after the category with the largest resulting total.
Only used when
cluster_keypoints toobsm. Controls how each observation’s proportion row is weighted when summing proportions to pick a macrostate’s dominant category:None(default) - every observation contributes equally, with weight \(1\), irrespective of any per-observation quantity.a key in
obs- each observation’s proportion row is scaled byadata.obs[weight_key]before summing, so observations with larger weights count proportionally more. The weights can be any per-observation quantity; a common choice is the number of cells per aggregated sample, which makes naming reflect cell-level rather than sample-level dominance.
kwargs (
Any) – Keyword arguments forcompute_schur().
- Return type:
- Returns:
: Returns self and updates the following fields:
from_adata¶
plot_coarse_T¶
- GPCCA.plot_coarse_T(show_stationary_dist=True, show_initial_dist=False, order='stability', cmap='viridis', xtick_rotation=45, annotate=True, show_cbar=True, title=None, figsize=(8, 8), dpi=80, save=None, text_kwargs=mappingproxy({}), **kwargs)[source]¶
Plot the coarse-grained transition matrix.
- Parameters:
show_stationary_dist (
bool) – Whether to show thecoarse_stationary_distribution, if present.show_initial_dist (
bool) – Whether to show thecoarse_initial_distribution.order (
Optional[Literal['stability','incoming','outgoing','stat_dist']]) –How to order the coarse-grained transition matrix. Valid options are:
'stability'- order by the values on the diagonal.'incoming'- order by the incoming mass, excluding the diagonal.'outgoing'- order by the outgoing mass, excluding the diagonal.'stat_dist'- order by coarse stationary distribution. If not present, use'stability'.
cmap (
str|ListedColormap) – Colormap to use.xtick_rotation (
float) – Rotation of ticks on the x-axis.annotate (
bool) – Whether to display the text on each cell.show_cbar (
bool) – Whether to show the colorbar.dpi (
int) – Dots per inch.text_kwargs (
Mapping[str,Any]) – Keyword arguments fortext().
- Return type:
- Returns:
: Nothing, just plots the figure. Optionally saves it based on
save.
plot_fate_probabilities¶
- GPCCA.plot_fate_probabilities(states=None, color=None, mode='embedding', time_key=None, basis='umap', same_plot=True, title=None, cmap='viridis', **kwargs)¶
Plot fate probabilities.
- Parameters:
states (
str|Sequence[str] |None) – Subset of the macrostates to show. IfNone, plot all macrostates.color (
str|None) – Key inobsoranndata.AnnData.varused to color the observations.mode (
Literal['embedding','time']) – Whether to plot the probabilities in an embedding or along the pseudotime.time_key (
str|None) – Key inobswhere pseudotime is stored. Only used whenmode = 'time'.basis (
str) – Key inobsmfor the embedding to use, e.g.'umap'or'tsne'.same_plot (
bool) – Whether to plot the data on the same plot or not. Only use whenmode = 'embedding'. If True anddiscrete = False,coloris ignored.cmap (
str) – Colormap for continuous annotations.kwargs (
Any) – Keyword arguments forembedding().self (FateProbsProtocol)
- Return type:
- Returns:
: Nothing, just plots the figure. Optionally saves it based on
save.
plot_lineage_drivers¶
- GPCCA.plot_lineage_drivers(lineage, n_genes=8, use_raw=False, ascending=False, ncols=None, title_fmt='{gene} qval={qval:.4e}', figsize=None, dpi=None, save=None, **kwargs)¶
Plot lineage drivers.
- Parameters:
lineage (
str) – Lineage for which to plot the driver genes.n_genes (
int) – Top most correlated genes to plot.ascending (
bool) – Whether to sort the genes in ascending order.title_fmt (
str) – Title format. Can include{gene},{pval},{qval}or{corr}, which will be substituted with the actual values.kwargs (
Any) – Keyword arguments forembedding().self (LinDriversProtocol)
- Return type:
- Returns:
: Nothing, just plots the figure. Optionally saves it based on
save.
plot_lineage_drivers_correlation¶
- GPCCA.plot_lineage_drivers_correlation(lineage_x, lineage_y, color=None, gene_sets=None, gene_sets_colors=None, use_raw=False, cmap='RdYlBu_r', fontsize=12, adjust_text=False, legend_loc='best', figsize=(4, 4), dpi=None, save=None, show=True, **kwargs)¶
Show scatter plot of gene-correlations between two lineages.
Optionally, a
dictof gene names can be passed to highlight in the plot.- Parameters:
lineage_x (
str) – Name of the lineage on the x-axis.lineage_y (
str) – Name of the lineage on the y-axis.color (
str|None) – Key invarorvarm, preferring for the former.gene_sets (
dict[str,Sequence[str]] |None) – Gene sets annotations of the form{'gene_set_name': ['gene_1', 'gene_2'], ...}.gene_sets_colors (
Sequence[str] |None) – List of colors where each entry corresponds to a gene set fromgenes_sets. If None and keys ingene_setscorrespond to lineage names, use the lineage colors. Otherwise, use default colors.cmap (
str) – Colormap to use.fontsize (
int) – Size of the text when plottinggene_sets.adjust_text (
bool) – Whether to automatically adjust text in order to reduce overlap.legend_loc (
str|None) – Position of the legend. IfNone, don’t show the legend. Only used whengene_sets != None.self (LinDriversProtocol)
- Return type:
- Returns:
: If
show = True, nothing, just plots, otherwise returns the axes object. Optionally saves it based onsave.
Notes
This plot is based on the following notebook by Maren Büttner.
plot_macrostate_composition¶
- GPCCA.plot_macrostate_composition(key, weight_key=None, width=0.8, title=None, labelrot=45, legend_loc='upper right out', figsize=None, dpi=None, save=None, show=True)[source]¶
Plot histogram of macrostates over categorical annotations.
The annotation can either be a hard categorical assignment, with one label per observation, or a soft assignment, with a distribution over categories per observation (e.g. cell-type fractions of aggregated samples).
- Parameters:
adata – Annotated data object.
key (
str|Mapping[str,str]) –Source of the categorical annotation. Either:
a
str, interpreted as a categorical column inobs. Each macrostate’s bar stacks the counts of the categories among its most-likely observations.a
dictof the form{"obs": <column>}(equivalent to passing astr) or{"obsm": <key>}. In the latter case,adata.obsm[key]must be aDataFramewhose columns are the categories and whose rows are per-observation proportions summing to \(1\) (e.g. cell-type fractions of aggregated samples). The proportion rows are summed per macrostate, so the bar semantics are identical to the categorical case – each observation contributes \(1\), split across categories – only the observations are now samples rather than cells.
weight_key (
str|None) – Only used whenkeypoints toobsm. Key fromobswith per-observation weights, e.g. the number of cells per aggregated sample, so the bars reflect cell-level rather than sample-level frequencies. IfNone, each observation contributes equally.width (
float) – Bar width in \([0, 1]\).title (
str|None) – Title of the figure. IfNone, create one automatically.labelrot (
float) – Rotation of labels on x-axis.legend_loc (
str|None) – Position of the legend. IfNone, don’t show the legend.
- Return type:
- Returns:
: If
show = True, nothing, just plots, otherwise returns the axes object. Optionally saves it based onsave.
plot_macrostates¶
- GPCCA.plot_macrostates(which, states=None, color=None, discrete=True, mode='embedding', time_key='latent_time', basis='umap', same_plot=True, title=None, cmap='viridis', **kwargs)¶
Plot macrostates on an embedding or along pseudotime.
- Parameters:
which (
Literal['all','initial','terminal','initial_and_terminal']) –Which macrostates to plot. Valid options are:
'all'- plot all macrostates.'initial'- plot macrostates marked asinitial_states.'terminal'- plot macrostates marked asterminal_states.'initial_and_terminal'- plot bothinitial_statesandterminal_statesin one plot. States are renamed to'initial: <name>'and'terminal: <name>'to tell them apart.
states (
str|Sequence[str] |None) – Subset of the macrostates to show. IfNone, plot all macrostates.color (
str|None) – Key inobsorvarused to color the observations.discrete (
bool) – Whether to plot the data as continuous or discrete observations. If the data cannot be plotted as continuous observations, it will be plotted as discrete.mode (
Literal['embedding','time']) – Whether to plot the probabilities in an embedding or along the pseudotime.time_key (
str) – Key inobswhere pseudotime is stored. Only used whenmode = 'time'.basis (
str) – Key inobsmfor the embedding to use, e.g.'umap'or'tsne'.same_plot (
bool) – Whether to plot the data on the same plot or not. Only use whenmode = 'embedding'. If True anddiscrete = False,coloris ignored.cmap (
str) – Colormap for continuous annotations.kwargs (
Any) – Keyword arguments forembedding().
- Return type:
- Returns:
: Nothing, just plots the figure. Optionally saves it based on
save.
plot_schur_matrix¶
- GPCCA.plot_schur_matrix(title='schur matrix', cmap='viridis', figsize=None, dpi=80, save=None, **kwargs)¶
Plot the Schur matrix.
- Parameters:
- Return type:
- Returns:
: Nothing, just plots the figure. Optionally saves it based on
save.
plot_spectrum¶
- GPCCA.plot_spectrum(n=None, real_only=None, show_eigengap=True, show_all_xticks=True, legend_loc=None, title=None, marker='.', figsize=(5, 5), dpi=100, save=None, **kwargs)¶
Plot the top eigenvalues in a real or a complex plane.
- Parameters:
n (
int|None) – Number of eigenvalues to show. IfNone, show all that have been computed.real_only (
bool|None) – Whether to plot only the real part of the spectrum. IfNone, plot real spectrum if no complex eigenvalues are present.show_eigengap (
bool) – Whenreal_only = True, this determines whether to show the inferred eigengap as a dotted line.show_all_xticks (
bool) – Whenreal_only = True, this determines whether to show the indices of all eigenvalues on the x-axis.legend_loc (
str|None) – Location parameter for the legend.marker (
str) – Marker symbol used, valid options can be found inmarkers.dpi (
int) – Dots per inch.
- Return type:
- Returns:
: Nothing, just plots the figure. Optionally saves it based on
save.
plot_tsi¶
- GPCCA.plot_tsi(n_macrostates=None, x_offset=(0.2, 0.2), y_offset=(0.1, 0.1), figsize=(6, 4), dpi=None, save=None, **kwargs)[source]¶
Plot terminal state identificiation (TSI).
Requires computing TSI with
tsi()first.- Parameters:
n_macrostates (
int|None) – Maximum number of macrostates to consider. Defaults to using all.kwargs (
Any) – Keyword arguments forlineplot().
- Return type:
- Returns:
: Plot TSI of the kernel and an optimal identification strategy.
predict¶
- GPCCA.predict(*args, **kwargs)[source]¶
Alias for
predict_terminal_states().Deprecated since version 2.1: Will be removed in CellRank 3.0. Use
predict_terminal_states()directly.- Parameters:
- Return type:
- Returns:
: Same as
predict_terminal_states().
predict_initial_states¶
- GPCCA.predict_initial_states(n_states=1, n_cells=30, allow_overlap=False)[source]¶
Compute initial states from macrostates using
coarse_stationary_distribution.- Parameters:
- Return type:
- Returns:
: Returns self and updates the following fields:
initial_states- Categorical annotation of initial states.initial_states_probabilities- Probability to be an initial state.initial_states_memberships- Initial states memberships.
predict_terminal_states¶
- GPCCA.predict_terminal_states(method='stability', n_cells=30, alpha=1, stability_threshold=0.96, n_states=None, allow_overlap=False)[source]¶
Automatically select terminal states from macrostates.
- Parameters:
method (
Literal['stability','top_n','eigengap','eigengap_coarse']) –How to select the terminal states. Valid option are:
'eigengap'- select the number of states based on the eigengap oftransition_matrix.'eigengap_coarse'- select the number of states based on the eigengap of the diagonal ofcoarse_T.'top_n'- select topn_statesbased on the probability of the diagonal ofcoarse_T.'stability'- select states which have a stability >=stability_threshold. The stability is given by the diagonal elements ofcoarse_T.
n_cells (
int) – Number of most likely cells from each macrostate to select.alpha (
float|None) – Weight given to the deviation of an eigenvalue from one. Only used whenmethod = 'eigengap'ormethod = 'eigengap_coarse'.stability_threshold (
float) – Threshold used whenmethod = 'stability'.n_states (
int|None) – Number of states used whenmethod = 'top_n'.allow_overlap (
bool) – Whether to allow overlapping names between initial and terminal states.
- Return type:
- Returns:
: Returns self and updates the following fields:
terminal_states- Categorical annotation of terminal states.terminal_states_probabilities- Probability to be a terminal state.terminal_states_memberships- Terminal states memberships.
read¶
- static GPCCA.read(fname, adata=None, copy=False)¶
De-serialize self from a file.
- Parameters:
- Return type:
IOMixin- Returns:
: The de-serialized object.
rename_initial_states¶
- GPCCA.rename_initial_states(old_new)¶
Rename the
initial_states.- Parameters:
old_new (
dict[str,str]) – Dictionary that maps old names to unique new names.- Return type:
- Returns:
: Returns self and updates the following fields:
initial_states- Categorical annotation of initial states.
rename_terminal_states¶
- GPCCA.rename_terminal_states(old_new)¶
Rename the
terminal_states.- Parameters:
old_new (
dict[str,str]) – Dictionary that maps old names to unique new names.- Return type:
- Returns:
: Returns self and updates the following fields:
terminal_states- Categorical annotation of terminal states.
set_initial_states¶
- GPCCA.set_initial_states(states=None, n_cells=30, allow_overlap=False, cluster_key=None, weight_key=None, agg='top_n', **kwargs)[source]¶
Set the
initial_states.- Parameters:
states (
str|Sequence[str] |dict[str,Sequence[str]] |Series|None) –Which states to select. Valid options are:
str,Sequence- subset ofmacrostates. Multiple states can be combined using',', such as['Alpha, Beta', 'Epsilon'].dict- keys correspond to initial states and values to cell IDs inobs_names.Series- categorical series where each category corresponds to a macrostate. NaN values mark cells that should not be marked asinitial_states.None- select allmacrostates.
n_cells (
int) – Number of most likely cells from each macrostate to select.allow_overlap (
bool) – Whether to allow overlapping names between initial and terminal states.cluster_key (
str|Mapping[str,str] |None) – Reference annotations to associate names and colors withinitial_states; seecompute_macrostates(). Only used whenstatesis adictorSeries.weight_key (
str|None) – Per-observation weights, only used whencluster_keypoints toobsm; seecompute_macrostates().agg (
Literal['top_n','union']) –How to select the cells representing each state when names are combined, e.g.,
['Alpha, Beta', 'Epsilon']. Only relevant when aggregating multiple macrostates into one state. Valid options are:'top_n'- select then_cellsmost confident cells of the combined membership. A dominant macrostate can crowd out the others, so the cells may not be representative of the combined state.'union'- select then_cellsmost confident cells of each macrostate and take their union, so every constituent macrostate is represented.
kwargs (
Any) – Additional keyword arguments.
- Return type:
- Returns:
: Returns self and updates the following fields:
initial_states- Categorical annotation of initial states.initial_states_probabilities- Probability to be an initial state.initial_states_memberships- Initial states memberships.
set_terminal_states¶
- GPCCA.set_terminal_states(states=None, n_cells=30, allow_overlap=False, cluster_key=None, weight_key=None, agg='top_n', **kwargs)[source]¶
Set the
terminal_states.- Parameters:
states (
str|Sequence[str] |dict[str,Sequence[str]] |Series|None) –Which states to select. Valid options are:
str,Sequence- subset ofmacrostates. Multiple states can be combined using',', such as['Alpha, Beta', 'Epsilon'].dict- keys correspond to terminal states and values to cell IDs inobs_names.Series- categorical series where each category corresponds to a macrostate. NaN values mark cells that should not be marked asterminal_states.None- select allmacrostates.
n_cells (
int) – Number of most likely cells from each macrostate to select.allow_overlap (
bool) – Whether to allow overlapping names between initial and terminal states.cluster_key (
str|Mapping[str,str] |None) – Reference annotations to associate names and colors withterminal_states; seecompute_macrostates(). Only used whenstatesis adictorSeries.weight_key (
str|None) – Per-observation weights, only used whencluster_keypoints toobsm; seecompute_macrostates().agg (
Literal['top_n','union']) –How to select the cells representing each state when names are combined, e.g.,
['Alpha, Beta', 'Epsilon']. Only relevant when aggregating multiple macrostates into one state. Valid options are:'top_n'- select then_cellsmost confident cells of the combined membership. A dominant macrostate can crowd out the others, so the cells may not be representative of the combined state.'union'- select then_cellsmost confident cells of each macrostate and take their union, so every constituent macrostate is represented.
kwargs (
Any) – Additional keyword arguments.
- Return type:
- Returns:
: Returns self and updates the following fields:
terminal_states- Categorical annotation of terminal states.terminal_states_probabilities- Probability to be a terminal state.terminal_states_memberships- Terminal states memberships.
to_adata¶
- GPCCA.to_adata(keep=('X', 'raw'), *, copy=True)¶
Serialize self to
Anndata.- Parameters:
keep (
Union[Literal['all'],Sequence[Literal['X','raw','layers','obs','var','obsm','varm','obsp','varp','uns']]]) –Which attributes to keep from the underlying
adata. Valid options are:'all'- keep all attributes specified in the signature.Sequence- keep only subset of these attributes.dict- the keys correspond the attribute names and values to a subset of keys which to keep from this attribute. If the values are specified either asTrueor'all', everything from this attribute will be kept.
copy (
bool|Sequence[Literal['X','raw','layers','obs','var','obsm','varm','obsp','varp','uns']]) – Whether to copy the data. Can be specified on per-attribute basis. Useful for attributes that are array-like.
- Return type:
- Returns:
: Annotated data object.