Compute macrostates

This example shows how to compute and plot the macrostates.

For the computation of macrostates, we adapted the popular Generalized Perron Cluster Cluster Analysis [GPCCA18] [Reuter19] method to the single cell context. We provide a scalable implementation which can decompose datasets of 100k+ cells into their dominant dynamical macrostates in just a few minutes. GPCCA relies on the real Schur decomposition to handle non-symmetric transition matrices as they arise from RNA velocity information, see Compute Schur vectors and Compute Schur matrix.

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

First, we need to compute the Schur vectors. By default, the first 10 vectors are computed.

g.compute_schur(n_components=4)

We can now compute the macrostates of the Markov chain. The first important parameter is the cluster_key, which tries to associate the names of macrostates with the cluster labels. The second import parameter is n_cells, which selects the top cells from each state based on the membership degree. By default, 30 cells are selected.

Lastly, the parameter n_states can also be estimated by using either the eigengap or the minChi criterion from [GPCCA18].

g.compute_macrostates(n_states=3, cluster_key="clusters")

After computing the macrostates, we can inspect them as follows. Below we show for each cell the membership degree of the macrostates.

g.macrostates_memberships
EpsilonAlphaBeta
0.1336230.1668880.699488
0.2290940.2798360.491069
0.1698190.1801750.650006
0.1480520.7808450.071103
0.0490440.0424630.908492
0.0337220.0559320.910346
0.0000240.0000970.999878
0.0303600.9484500.021190
0.2061010.1940520.599847
0.2670990.2279820.504919
.........
0.2177890.2011180.581093
0.0333010.0442590.922440
0.9327060.0531250.014169
0.0257310.9683620.005907
0.0000000.0000080.999991
0.2150480.1795700.605382
0.0244580.0571850.918357
0.0673580.0848990.847743
0.1187450.1375990.743656
0.0385570.9532140.008229

2531 cells x 3 lineages



To get the categorical observations, top n_cells for each macrostate, we can inspect the attribute below.

g.macrostates

Out:

index
AAACCTGAGAGGGATA-1-3    NaN
AAACCTGAGGCAATTA-1-3    NaN
AAACCTGGTAAGTGGC-1-3    NaN
AAACCTGTCCCTCTTT-1-3    NaN
AAACGGGAGTAGCGGT-1-3    NaN
                       ...
TTTGGTTTCCTTTCGG-1-3    NaN
TTTGTCAAGAATGTGT-1-3    NaN
TTTGTCAAGTGACATA-1-3    NaN
TTTGTCATCGAATGCT-1-3    NaN
TTTGTCATCTGTTTGT-1-3    NaN
Length: 2531, dtype: category
Categories (3, object): ['Epsilon', 'Alpha', 'Beta']

We can now plot the membership degree, as well as the categorical assignment.

g.plot_macrostates()
g.plot_macrostates(discrete=True)
  • macrostates memberships (fwd)
  • macrostates (fwd)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/cellrank/envs/latest/lib/python3.8/site-packages/scvelo/plotting/utils.py:115: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
  ax = pl.figure(None, figsize, dpi=dpi).gca(projection=projection)
/home/docs/checkouts/readthedocs.org/user_builds/cellrank/envs/latest/lib/python3.8/site-packages/scvelo/plotting/utils.py:115: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
  ax = pl.figure(None, figsize, dpi=dpi).gca(projection=projection)

Both of these options are shown in the same plot, which is not always desirable. To change this, simply run the following.

g.plot_macrostates(same_plot=False)
to Epsilon, to Alpha, to Beta

Method cellrank.tl.estimators.GPCCA.compute_macrostates() also computes the coarse-grained transition matrix between the macrostates, see Compute coarse-grained transition matrix.

Total running time of the script: ( 0 minutes 23.161 seconds)

Estimated memory usage: 711 MB

Gallery generated by Sphinx-Gallery