Compute terminal states using GPCCA

This example shows how to compute and plot the terminal states using the cellrank.tl.estimators.GPCCA.

This estimator makes use of Generalized Perron Cluster Cluster Analysis [GPCCA18] [Reuter19].

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

Next, we need to compute the Schur vectors and the macrostates. We refer the reader to Compute macrostates where the method is explained more in detail.

g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")

Out:

INFO: Using pre-computed schur decomposition

For cellrank.tl.estimators.GPCCA, there are 3 methods for choosing the terminal states:

We will cover each of these methods below. In the last 2 cases, parameter n_cells controls how many cells to take from each terminal state we take as a categorical annotation.

Set terminal states

cellrank.tl.estimators.GPCCA.set_terminal_states() simply sets the terminal states manually - this can be useful when the terminal states are known beforehand. In this case, we don’t need to compute the macrostates.

The states can be specified either as a categorical pandas.Series where NaN values mark cells not belonging to any terminal state or a dict, where keys correspond to the names of the terminal states, and the values to the sequence of cell names or their indices.

Below we set the terminal state called “Alpha” as all the cells from the “Alpha” cluster under adata.obs["clusters"].

g.set_terminal_states({"Alpha": adata[adata.obs["clusters"] == "Alpha"].obs_names})

Set terminal states from macrostates

cellrank.tl.estimators.GPCCA.set_terminal_states_from_macrostates() sets the terminal states by subsetting the macrostates. Note that multiple states can also be combined into new, joint states, as shown below, where we combine “Alpha” and “Beta” states into a new one.

g.set_terminal_states_from_macrostates(["Alpha, Beta", "Epsilon"])

Compute terminal states

Lastly, cellrank.tl.estimators.GPCCA.compute_terminal_states() which also makes use of the coarse-grained transition matrix cellrank.tl.estimators.GPCCA.coarse_T of the macrostates or the eigengap statistic.

In the example below, we use method='eigenap' which selects the number of states based on the eigengap. The terminal states are defined as the top most likely states from the diagonal of the coarse-grained transition matrix. To find out more, see Compute coarse-grained transition matrix.

g.compute_terminal_states(method="eigengap")

Now that the terminal states have been either set or computed, we can visualize them in an embedding. All of the options seen in Compute macrostates also apply here, like plotting in the same plot (parameter same_plot) or plotting the discrete values (parameter discrete).

g.plot_terminal_states(same_plot=False)
to Epsilon, to Alpha, to Beta

We note that membership degree of macrostates/terminal states should not be confused with the probability of traveling/developing towards these states. For that, we compute the absorption probabilities, see Compute absorption probabilities. The assignment of cells to macrostates is a soft assignment that specifies the degree of membership of any particular cell to a given state.

Total running time of the script: ( 0 minutes 22.848 seconds)

Estimated memory usage: 596 MB

Gallery generated by Sphinx-Gallery