Note
Click here to download the full example code
Compute terminal states using GPCCA¶
This example shows how to compute and plot the terminal states using the cellrank.tl.estimators.GPCCA
.
This estimator makes use of Generalized Perron Cluster Cluster Analysis [GPCCA18] [Reuter19] as implemented in pyGPCCA.
import cellrank as cr
adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata
Out:
AnnData object with n_obs × n_vars = 2531 × 2000
obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
varm: 'PCs', 'loss'
layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
obsp: 'connectivities', 'distances'
First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA
estimator.
k = cr.tl.transition_matrix(
adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)
Next, we need to compute the Schur vectors and the macrostates. We refer the reader to Compute macrostates where the method is explained more in detail.
g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")
For cellrank.tl.estimators.GPCCA
, there are 3 methods for choosing the terminal states:
We will cover each of these methods below. In the last 2 cases, parameter n_cells
controls how many cells to take
from each terminal state we take as a categorical annotation.
Set terminal states¶
cellrank.tl.estimators.GPCCA.set_terminal_states()
simply sets the terminal states manually - this
can be useful when the terminal states are known beforehand. In this case, we don’t need to compute the macrostates.
The states can be specified either as a categorical pandas.Series
where NaN values mark cells
not belonging to any terminal state or a dict
, where keys correspond to the names of the terminal states,
and the values to the sequence of cell names or their indices.
Below we set the terminal state called “Alpha” as all the cells from the “Alpha”
cluster under adata.obs["clusters"]
.
g.set_terminal_states({"Alpha": adata[adata.obs["clusters"] == "Alpha"].obs_names})
Set terminal states from macrostates¶
cellrank.tl.estimators.GPCCA.set_terminal_states_from_macrostates()
sets the terminal states by subsetting
the macrostates. Note that multiple states can also be combined into new, joint states, as shown below,
where we combine “Alpha” and “Beta” states into a new one.
g.set_terminal_states_from_macrostates(["Alpha, Beta", "Epsilon"])
Compute terminal states¶
Lastly, cellrank.tl.estimators.GPCCA.compute_terminal_states()
which also makes use of the coarse-grained
transition matrix cellrank.tl.estimators.GPCCA.coarse_T
of the macrostates or the eigengap
statistic.
In the example below, we use method='eigenap'
which selects the number of states based on the eigengap. The
terminal states are defined as the top most likely states from the diagonal of the coarse-grained transition matrix.
To find out more, see Compute coarse-grained transition matrix.
g.compute_terminal_states(method="eigengap")
Now that the terminal states have been either set or computed, we can visualize them in an embedding.
All of the options seen in Compute macrostates also apply here,
like plotting in the same plot (parameter same_plot
) or plotting the discrete values (parameter
discrete
).
g.plot_terminal_states(same_plot=False)

We note that membership degree of macrostates/terminal states should not be confused with the probability of traveling/developing towards these states. For that, we compute the absorption probabilities, see Compute absorption probabilities. The assignment of cells to macrostates is a soft assignment that specifies the degree of membership of any particular cell to a given state.
Total running time of the script: ( 0 minutes 22.359 seconds)
Estimated memory usage: 711 MB