# cellrank.estimators.GPCCA#

class cellrank.estimators.GPCCA(object, **kwargs)[source]#

Generalized Perron Cluster Cluster Analysis as implemented in pyGPCCA.

Coarse-grains a discrete Markov chain into a set of macrostates and computes coarse-grained transition probabilities among the macrostates. Each macrostate corresponds to an area of the state space, i.e. to a subset of cells. The assignment is soft, i.e. each cell is assigned to every macrostate with a certain weight, where weights sum to one per cell. Macrostates are computed by maximizing the ‘crispness’ which can be thought of as a measure for minimal overlap between macrostates in a certain inner-product sense. Once the macrostates have been computed, we project the large transition matrix onto a coarse-grained transition matrix among the macrostates via a Galerkin projection. This projection is based on invariant subspaces of the original transition matrix which are obtained using the real Schur decomposition .

Parameters:

## Attributes table#

 absorption_probabilities Absorption probabilities. absorption_times Mean and variance of the time until absorption. adata Annotated data object. backward Direction of kernel. coarse_T Coarse-grained transition matrix. coarse_initial_distribution Coarse-grained initial distribution. coarse_stationary_distribution Coarse-grained stationary distribution. eigendecomposition Eigendecomposition of transition_matrix. kernel Underlying kernel expression. lineage_drivers Potential lineage drivers. macrostates Macrostates of the transition matrix. macrostates_memberships Macrostate membership matrix. params Estimator parameters. priming_degree Priming degree. schur_matrix Schur matrix. schur_vectors Real Schur vectors of the transition matrix. shape Shape of the kernel. terminal_states Categorical annotation of terminal states. terminal_states_memberships Terminal state membership matrix. terminal_states_probabilities Aggregated probability of cells to be in terminal states. transition_matrix Transition matrix of kernel.

## Methods table#

 compute_absorption_probabilities([keys, ...]) Compute absorption probabilities. compute_eigendecomposition([k, which, ...]) Compute eigendecomposition of transition_matrix. compute_lineage_drivers([lineages, method, ...]) Compute driver genes per lineage. compute_lineage_priming([method, early_cells]) Compute the degree of lineage priming. compute_macrostates([n_states, n_cells, ...]) Compute the macrostates. compute_schur([n_components, ...]) Compute Schur decomposition. compute_terminal_states(*args, **kwargs) Compute terminal states of the process. copy(*[, deep]) Return a copy of self. fit([n_states, n_cells, cluster_key]) Prepare self for terminal states prediction. from_adata(adata, obsp_key) De-serialize self from anndata.AnnData. plot_absorption_probabilities([states, ...]) Plot continuous or categorical observations in an embedding or along pseudotime. plot_coarse_T([show_stationary_dist, ...]) Plot the coarse-grained transition matrix between macrostates. plot_lineage_drivers(lineage[, n_genes, ...]) Plot lineage drivers discovered by compute_lineage_drivers(). plot_lineage_drivers_correlation(lineage_x, ...) Show scatter plot of gene-correlations between two lineages. plot_macrostate_composition(key[, width, ...]) Plot stacked histogram of macrostates over categorical annotations. plot_macrostates([states, color, discrete, ...]) Plot continuous or categorical observations in an embedding or along pseudotime. plot_schur_matrix([title, cmap, figsize, ...]) Plot the Schur matrix. plot_spectrum([n, real_only, show_eigengap, ...]) Plot the top eigenvalues in real or complex plane. plot_terminal_states([states, color, ...]) Plot continuous or categorical observations in an embedding or along pseudotime. predict([method, n_cells, alpha, ...]) Automatically select terminal states from macrostates. read(fname[, adata, copy]) De-serialize self from a file. rename_terminal_states(new_names) Rename categories in terminal_states. set_terminal_states(labels[, cluster_key, ...]) Manually define terminal states. Manually select terminal states from macrostates. to_adata([keep, copy]) Serialize self to anndata.Anndata. write(fname[, write_adata, ext]) Serialize self to a file.

## Attributes#

### absorption_probabilities#

GPCCA.absorption_probabilities#

Absorption probabilities.

Informally, given a (finite, discrete) Markov chain with a set of transient states $$T$$ and a set of absorbing states $$A$$, the absorption probability for cell $$i$$ from $$T$$ to reach cell $$j$$ from $$R$$ is the probability that a random walk initialized in $$i$$ will reach absorbing state $$j$$.

In our context, states correspond to cells, in particular, absorbing states correspond to cells in terminal states.

Return type:

Optional[Lineage]

### absorption_times#

GPCCA.absorption_times#

Mean and variance of the time until absorption.

Related to conditional mean first passage times. Corresponds to the expectation of the time until absorption, depending on initialization, and the variance.

Return type:

Annotated data object.

Return type:

AnnData

### backward#

GPCCA.backward#

Direction of kernel.

Return type:

bool

### coarse_T#

GPCCA.coarse_T#

Coarse-grained transition matrix.

Return type:

### coarse_initial_distribution#

GPCCA.coarse_initial_distribution#

Coarse-grained initial distribution.

Return type:

### coarse_stationary_distribution#

GPCCA.coarse_stationary_distribution#

Coarse-grained stationary distribution.

Return type:

### eigendecomposition#

GPCCA.eigendecomposition#

Eigendecomposition of transition_matrix.

For non-symmetric real matrices, left and right eigenvectors will in general be different and complex. We compute both left and right eigenvectors.

Return type:
Returns:

A dictionary with the following keys:

• ’D’ - the eigenvalues.

• ’eigengap’ - the eigengap.

• ’params’ - parameters used for the computation.

• ’V_l’ - left eigenvectors (optional).

• ’V_r’ - right eigenvectors (optional).

• ’stationary_dist’ - stationary distribution of transition_matrix, if present.

### kernel#

GPCCA.kernel#

Underlying kernel expression.

Return type:

TypeVar(KernelExpression, bound= KernelMixin)

### lineage_drivers#

GPCCA.lineage_drivers#

Potential lineage drivers.

Computes Pearson correlation of each gene with fate probabilities for every terminal state. High Pearson correlation indicates potential lineage drivers. Also computes p-values and confidence intervals.

Return type:
Returns:

Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, one for each lineage:

• {lineage}_corr - correlation between the gene expression and absorption probabilities.

• {lineage}_pval - calculated p-values for double-sided test.

• {lineage}_qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

• {lineage}_ci_low - lower bound of the confidence_level correlation confidence interval.

• {lineage}_ci_high - upper bound of the confidence_level correlation confidence interval.

### macrostates#

GPCCA.macrostates#

Macrostates of the transition matrix.

Return type:

### macrostates_memberships#

GPCCA.macrostates_memberships#

Macrostate membership matrix.

Soft assignment of microstates (cells) to macrostates.

Return type:

Optional[Lineage]

### params#

GPCCA.params#

Estimator parameters.

Return type:

### priming_degree#

GPCCA.priming_degree#

Priming degree.

Given a cell $$i$$ and a set of terminal states, this quantifies how committed vs. naive cell $$i$$ is, i.e. its degree of pluripotency. Low values correspond to naive cells (high degree of pluripotency), high values correspond to committed cells (low degree of pluripotency).

Return type:

### schur_matrix#

GPCCA.schur_matrix#

Schur matrix.

The real Schur decomposition is a generalization of the Eigendecomposition and can be computed for any real-valued, square matrix $$A$$. It is given by $$A = Q R Q^T$$, where $$Q$$ contains the real Schur vectors and $$R$$ is the Schur matrix. $$Q$$ is orthogonal and $$R$$ is quasi-upper triangular with 1x1 and 2x2 blocks on the diagonal. If PETSc and SLEPc are installed, only the leading Schur vectors are computed.

Return type:

### schur_vectors#

GPCCA.schur_vectors#

Real Schur vectors of the transition matrix.

The real Schur decomposition is a generalization of the Eigendecomposition and can be computed for any real-valued, square matrix $$A$$. It is given by $$A = Q R Q^T$$, where $$Q$$ contains the real Schur vectors and $$R$$ is the Schur matrix. $$Q$$ is orthogonal and $$R$$ is quasi-upper triangular with 1x1 and 2x2 blocks on the diagonal. If PETSc and SLEPc are installed, only the leading Schur vectors are computed.

Return type:

### shape#

GPCCA.shape#

Shape of the kernel.

Return type:

### terminal_states#

GPCCA.terminal_states#

Categorical annotation of terminal states.

By default, all cells in transient cells will be labeled as NaN.

Return type:

### terminal_states_memberships#

GPCCA.terminal_states_memberships#

Terminal state membership matrix.

Soft assignment of cells to terminal states.

Return type:

Optional[Lineage]

### terminal_states_probabilities#

GPCCA.terminal_states_probabilities#

Aggregated probability of cells to be in terminal states.

Return type:

### transition_matrix#

GPCCA.transition_matrix#

Transition matrix of kernel.

Return type:

## Methods#

### compute_absorption_probabilities#

GPCCA.compute_absorption_probabilities(keys=None, solver='gmres', use_petsc=True, time_to_absorption=None, n_jobs=None, backend='loky', show_progress_bar=True, tol=1e-06, preconditioner=None)#

Compute absorption probabilities.

For each cell, this computes the probability of being absorbed in any of the terminal_states. In particular, this corresponds to the probability that a random walk initialized in transient cell $$i$$ will reach any cell from a fixed transient state before reaching a cell from any other transient state.

Parameters:
Return type:

None

Returns:

Nothing, just updates the following fields:

### compute_eigendecomposition#

GPCCA.compute_eigendecomposition(k=20, which='LR', alpha=1.0, only_evals=False, ncv=None)#

Compute eigendecomposition of transition_matrix.

Uses a sparse implementation, if possible, and only computes the top $$k$$ eigenvectors to speed up the computation. Computes both left and right eigenvectors.

Parameters:
Return type:

None

Returns:

Nothing, just updates the following field:

### compute_lineage_drivers#

GPCCA.compute_lineage_drivers(lineages=None, method=TestMethod.FISCHER, cluster_key=None, clusters=None, layer=None, use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, **kwargs)#

Compute driver genes per lineage.

Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.

Parameters:
Return type:

DataFrame

Returns:

Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, one for each lineage:

• {lineage}_corr - correlation between the gene expression and absorption probabilities.

• {lineage}_pval - calculated p-values for double-sided test.

• {lineage}_qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

• {lineage}_ci_low - lower bound of the confidence_level correlation confidence interval.

• {lineage}_ci_high - upper bound of the confidence_level correlation confidence interval.

### compute_lineage_priming#

GPCCA.compute_lineage_priming(method='kl_divergence', early_cells=None)#

Compute the degree of lineage priming.

It returns a score in [0, 1] where 0 stands for naive and 1 stands for committed.

Parameters:
Return type:

Series

Returns:

The priming degree.

### compute_macrostates#

GPCCA.compute_macrostates(n_states=None, n_cells=30, cluster_key=None, **kwargs)[source]#

Compute the macrostates.

Parameters:
Return type:

None

Returns:

Nothing, just updates the following fields:

### compute_schur#

GPCCA.compute_schur(n_components=20, initial_distribution=None, method='krylov', which='LR', alpha=1.0)#

Compute Schur decomposition.

Parameters:
• n_components (int) – Number of Schur vectors to compute.

• initial_distribution (Optional[ndarray]) – Input distribution over all cells. If None, uniform distribution is used.

• method (Literal[‘krylov’, ‘brandts’]) –

Method for calculating the Schur vectors. Valid options are:

• ’krylov’ - an iterative procedure that computes a partial, sorted Schur decomposition for large, sparse matrices.

• ’brandts’ - full sorted Schur decomposition of a dense matrix.

For benefits of each method, see pygpcca.GPCCA.

• which (Literal[‘LR’, ‘LM’]) –

How to sort the eigenvalues. Valid option are:

• ’LR’ - the largest real part.

• ’LM’ - the largest magnitude.

• alpha (float) – Used to compute the eigengap. alpha is the weight given to the deviation of an eigenvalue from one.

Returns:

Nothing, just updates the following fields:

### compute_terminal_states#

GPCCA.compute_terminal_states(*args, **kwargs)#

Compute terminal states of the process.

This is an alias for predict().

Parameters:
Return type:

None

Returns:

Nothing, just updates the following fields:

### copy#

GPCCA.copy(*, deep=False)#

Return a copy of self.

Parameters:

deep (bool) – Whether to return a deep copy or not. If True, this also copies the adata.

Return type:

BaseEstimator

Returns:

A copy of self.

### fit#

GPCCA.fit(n_states=None, n_cells=30, cluster_key=None, **kwargs)[source]#

Prepare self for terminal states prediction.

Parameters:
Return type:

GPCCA

Returns:

Nothing, just updates the following fields:

De-serialize self from anndata.AnnData.

Parameters:
Return type:

BaseEstimator

Returns:

The de-serialized object.

### plot_absorption_probabilities#

GPCCA.plot_absorption_probabilities(states=None, color=None, discrete=True, mode=PlotMode.EMBEDDING, time_key='latent_time', same_plot=True, title=None, cmap='viridis', **kwargs)#

Plot continuous or categorical observations in an embedding or along pseudotime.

Parameters:
Return type:

None

Returns:

Nothing, just plots the figure. Optionally saves it based on save.

### plot_coarse_T#

GPCCA.plot_coarse_T(show_stationary_dist=True, show_initial_dist=False, order='stability', cmap='viridis', xtick_rotation=45, annotate=True, show_cbar=True, title=None, figsize=(8, 8), dpi=80, save=None, text_kwargs=mappingproxy({}), **kwargs)[source]#

Plot the coarse-grained transition matrix between macrostates.

Parameters:
Return type:

None

Returns:

Nothing, just plots the figure. Optionally saves it based on save.

### plot_lineage_drivers#

GPCCA.plot_lineage_drivers(lineage, n_genes=8, use_raw=False, ascending=False, ncols=None, title_fmt='{gene} qval={qval:.4e}', figsize=None, dpi=None, save=None, **kwargs)#

Plot lineage drivers discovered by compute_lineage_drivers().

Parameters:
Return type:

None

Returns:

Nothing, just plots the figure. Optionally saves it based on save.

### plot_lineage_drivers_correlation#

GPCCA.plot_lineage_drivers_correlation(lineage_x, lineage_y, color=None, gene_sets=None, gene_sets_colors=None, use_raw=False, cmap='RdYlBu_r', fontsize=12, adjust_text=False, legend_loc='best', figsize=(4, 4), dpi=None, save=None, show=True, **kwargs)#

Show scatter plot of gene-correlations between two lineages.

Optionally, you can pass a dict of gene names that will be annotated in the plot.

Parameters:
Return type:
Returns:

The axes object, if show = False. Nothing, just plots the figure. Optionally saves it based on save.

Notes

This plot is based on the following notebook by Maren Büttner.

### plot_macrostate_composition#

GPCCA.plot_macrostate_composition(key, width=0.8, title=None, labelrot=45, legend_loc='upper right out', figsize=None, dpi=None, save=None, show=True)[source]#

Plot stacked histogram of macrostates over categorical annotations.

Parameters:
Return type:
Returns:

The axes object, if show = False. Nothing, just plots the figure. Optionally saves it based on save.

### plot_macrostates#

GPCCA.plot_macrostates(states=None, color=None, discrete=True, mode=PlotMode.EMBEDDING, time_key='latent_time', same_plot=True, title=None, cmap='viridis', **kwargs)#

Plot continuous or categorical observations in an embedding or along pseudotime.

Parameters:
Return type:

None

Returns:

Nothing, just plots the figure. Optionally saves it based on save.

### plot_schur_matrix#

GPCCA.plot_schur_matrix(title='schur matrix', cmap='viridis', figsize=None, dpi=80, save=None, **kwargs)#

Plot the Schur matrix.

Parameters:
Returns:

Nothing, just plots the figure. Optionally saves it based on save.

### plot_spectrum#

GPCCA.plot_spectrum(n=None, real_only=None, show_eigengap=True, show_all_xticks=True, legend_loc=None, title=None, marker='.', figsize=(5, 5), dpi=100, save=None, **kwargs)#

Plot the top eigenvalues in real or complex plane.

Parameters:
Return type:

None

Returns:

Nothing, just plots the figure. Optionally saves it based on save.

### plot_terminal_states#

GPCCA.plot_terminal_states(states=None, color=None, discrete=True, mode=PlotMode.EMBEDDING, time_key='latent_time', same_plot=True, title=None, cmap='viridis', **kwargs)#

Plot continuous or categorical observations in an embedding or along pseudotime.

Parameters:
Return type:

None

Returns:

Nothing, just plots the figure. Optionally saves it based on save.

### predict#

GPCCA.predict(method=TermStatesMethod.STABILITY, n_cells=30, alpha=1, stability_threshold=0.96, n_states=None)[source]#

Automatically select terminal states from macrostates.

Parameters:
• method (Literal[‘stability’, ‘top_n’, ‘eigengap’, ‘eigengap_coarse’]) –

How to select the terminal states. Valid option are:

• ’eigengap’ - select the number of states based on the eigengap of transition_matrix.

• ’eigengap_coarse’ - select the number of states based on the eigengap of the diagonal of coarse_T.

• ’top_n’ - select top n_states based on the probability of the diagonal of coarse_T.

• ’stability’ - select states which have a stability >= stability_threshold. The stability is given by the diagonal elements of coarse_T.

• n_cells (int) – Number of most likely cells from each macrostate to select.

• alpha (Optional[float]) – Weight given to the deviation of an eigenvalue from one. Only used when method = 'eigengap' or method = 'eigengap_coarse'.

• stability_threshold (float) – Threshold used when method = 'stability'.

• n_states (Optional[int]) – Number of states used when method = 'top_n'.

Return type:

None

Returns:

Nothing, just updates the following fields:

De-serialize self from a file.

Parameters:
Return type:

IOMixin

Returns:

The de-serialized object.

### rename_terminal_states#

GPCCA.rename_terminal_states(new_names)[source]#

Rename categories in terminal_states.

Parameters:

new_names (Mapping[str, str]) – Mapping where keys corresponds to the old names and the values to the new names. The new names must be unique.

Return type:

None

Returns:

Nothing, just updates the names of:

### set_terminal_states#

Manually define terminal states.

Parameters:
• Defines the terminal states. Valid options are:

• categorical pandas.Series where each category corresponds to a terminal state. NaN entries denote cells that do not belong to any terminal state, i.e. these are either initial or transient cells.

• dict where keys are terminal states and values are lists of cell barcodes corresponding to annotations in adata.AnnData.obs_names. If only 1 key is provided, values should correspond to terminal state clusters if a categorical pandas.Series can be found in anndata.AnnData.obs.

• cluster_key (Optional[str]) – Key in anndata.AnnData.obs in order to associate names and colors with terminal_states. Each terminal state will be given the name and color corresponding to the cluster it mostly overlaps with.

• add_to_existing (bool) – Whether the new terminal states should be added to the existing ones. Cells already assigned to a terminal state will be re-assigned to the new terminal state if there’s a conflict between old and new annotations. This throws an error if no previous annotations corresponding to terminal states have been found.

Return type:

None

Returns:

Nothing, just updates the following fields:

### set_terminal_states_from_macrostates#

GPCCA.set_terminal_states_from_macrostates(names=None, n_cells=30, **kwargs)[source]#

Manually select terminal states from macrostates.

Parameters:
Return type:

None

Returns:

Nothing, just updates the following fields:

Serialize self to anndata.Anndata.

Parameters:
Return type:

AnnData

Returns:

adata : anndata.AnnData Annotated data object.

### write#

None
Nothing, just writes itself to a file using pickle.