Compute macrostates

This example shows how to compute and plot the macrostates.

For the computation of macrostates, we adapted the popular Generalized Perron Cluster Cluster Analysis [GPCCA18] [Reuter19] method to the single cell context. We provide a scalable implementation which can decompose datasets of 100k+ cells into their dominant dynamical macrostates in just a few minutes. GPCCA relies on the real Schur decomposition to handle non-symmetric transition matrices as they arise from RNA velocity information, see Compute Schur vectors and Compute Schur matrix.

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

First, we need to compute the Schur vectors. By default, the first 10 vectors are computed.

g.compute_schur(n_components=4)

We can now compute the macrostates of the Markov chain. The first important parameter is the cluster_key, which tries to associate the names of macrostates with the cluster labels. The second import parameter is n_cells, which selects the top cells from each state based on the membership degree. By default, 30 cells are selected.

Lastly, the parameter n_states can also be estimated by using either the eigengap or the minChi criterion from [GPCCA18].

g.compute_macrostates(n_states=3, cluster_key="clusters")

Out:

INFO: Using pre-computed schur decomposition

After computing the macrostates, we can inspect them as follows. Below we show for each cell the membership degree of the macrostates.

g.macrostates_memberships
EpsilonAlphaBeta
0.1336230.1668880.699488
0.2290940.2798360.491069
0.1698190.1801750.650006
0.1480520.7808450.071103
0.0490440.0424630.908492
0.0337220.0559320.910346
0.0000240.0000970.999878
0.0303600.9484500.021190
0.2061010.1940520.599847
0.2670990.2279820.504919
.........
0.2177890.2011180.581093
0.0333010.0442590.922440
0.9327060.0531250.014169
0.0257310.9683620.005907
0.0000000.0000080.999991
0.2150480.1795700.605382
0.0244580.0571850.918357
0.0673580.0848990.847743
0.1187450.1375990.743656
0.0385570.9532140.008229

2531 cells x 3 lineages



To get the categorical observations, top n_cells for each macrostate, we can inspect the attribute below.

g.macrostates

Out:

index
AAACCTGAGAGGGATA-1-3    NaN
AAACCTGAGGCAATTA-1-3    NaN
AAACCTGGTAAGTGGC-1-3    NaN
AAACCTGTCCCTCTTT-1-3    NaN
AAACGGGAGTAGCGGT-1-3    NaN
                       ...
TTTGGTTTCCTTTCGG-1-3    NaN
TTTGTCAAGAATGTGT-1-3    NaN
TTTGTCAAGTGACATA-1-3    NaN
TTTGTCATCGAATGCT-1-3    NaN
TTTGTCATCTGTTTGT-1-3    NaN
Length: 2531, dtype: category
Categories (3, object): ['Epsilon', 'Alpha', 'Beta']

We can now plot the membership degree, as well as the categorical assignment.

g.plot_macrostates()
g.plot_macrostates(discrete=True)
  • macrostates memberships (fwd)
  • macrostates (fwd)

Both of these options are shown in the same plot, which is not always desirable. To change this, simply run the following.

g.plot_macrostates(same_plot=False)
to Epsilon, to Alpha, to Beta

Method cellrank.tl.estimators.GPCCA.compute_macrostates() also computes the coarse-grained transition matrix between the macrostates, see Compute coarse-grained transition matrix.

Total running time of the script: ( 0 minutes 23.468 seconds)

Estimated memory usage: 596 MB

Gallery generated by Sphinx-Gallery