Compute macrostates

This example shows how to compute and plot the macrostates.

For the computation of macrostates, we adapted the popular Generalized Perron Cluster Cluster Analysis [Reuter et al., 2018] [Reuter et al., 2019] method to the single cell context. We provide a scalable implementation which can decompose datasets of 100k+ cells into their dominant dynamical macrostates in just a few minutes. GPCCA relies on the real Schur decomposition to handle non-symmetric transition matrices as they arise from RNA velocity information.

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/cellrank/checkouts/latest/examples/estimators/compute_macrostates.py:20: DeprecationWarning: `cellrank.tl.transition_matrix` will be removed in version `2.0`. Please use the `cellrank.kernels` or `cellrank.estimators` interface instead.
  k = cr.tl.transition_matrix(

First, we need to compute the Schur vectors. By default, the first 10 vectors are computed.

g.compute_schur(n_components=4)

Out:

DEBUG:root:The Schur vectors aren't D-orthogonal so they are D-orthogonalized.

We can now compute the macrostates of the Markov chain. The first important parameter is the cluster_key, which tries to associate the names of macrostates with the cluster labels. The second import parameter is n_cells, which selects the top cells from each state based on the membership degree. By default, 30 cells are selected.

Lastly, the parameter n_states can also be estimated by using either the eigengap or the minChi criterion from [Reuter et al., 2018].

g.compute_macrostates(n_states=3, cluster_key="clusters")

Out:

INFO:root:Using pre-computed Schur decomposition

After computing the macrostates, we can inspect them as follows. Below we show for each cell the membership degree of the macrostates.

g.macrostates_memberships
EpsilonAlphaBeta
0.1336230.1668880.699488
0.2290940.2798360.491069
0.1698190.1801750.650006
0.1480520.7808450.071103
0.0490440.0424630.908492
0.0337220.0559320.910346
0.0000240.0000970.999878
0.0303600.9484500.021190
0.2061010.1940520.599847
0.2670990.2279820.504919
.........
0.2177890.2011180.581093
0.0333010.0442590.922440
0.9327060.0531250.014169
0.0257310.9683620.005907
0.0000000.0000080.999991
0.2150480.1795700.605382
0.0244580.0571850.918357
0.0673580.0848990.847743
0.1187450.1375990.743656
0.0385570.9532140.008229

2531 cells x 3 lineages



To get the categorical observations, top n_cells for each macrostate, we can inspect the attribute below.

g.macrostates

Out:

index
AAACCTGAGAGGGATA-1-3    NaN
AAACCTGAGGCAATTA-1-3    NaN
AAACCTGGTAAGTGGC-1-3    NaN
AAACCTGTCCCTCTTT-1-3    NaN
AAACGGGAGTAGCGGT-1-3    NaN
                       ...
TTTGGTTTCCTTTCGG-1-3    NaN
TTTGTCAAGAATGTGT-1-3    NaN
TTTGTCAAGTGACATA-1-3    NaN
TTTGTCATCGAATGCT-1-3    NaN
TTTGTCATCTGTTTGT-1-3    NaN
Length: 2531, dtype: category
Categories (3, object): ['Epsilon', 'Alpha', 'Beta']

We can now plot the membership degree, as well as the categorical assignment.

g.plot_macrostates()
g.plot_macrostates(discrete=True)
  • macrostates memberships
  • macrostates

Out:

DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved

Both of these options are shown in the same plot, which is not always desirable. To change this, simply run the following.

g.plot_macrostates(same_plot=False)
macrostates memberships Epsilon, macrostates memberships Alpha, macrostates memberships Beta

Out:

DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f3b7bfe6100>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f3b7bfe6100>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.MaxNLocator object at 0x7f3b7c894d90>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f3b7c993370>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f3b7c993370>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.MaxNLocator object at 0x7f3b7c993f40>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f3badf454f0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f3badf454f0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.MaxNLocator object at 0x7f3b7bdb4280>
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved

Method cellrank.tl.estimators.GPCCA.compute_macrostates() also computes the coarse-grained transition matrix between the macrostates, see Compute coarse-grained transition matrix.

Total running time of the script: ( 0 minutes 18.495 seconds)

Estimated memory usage: 801 MB

Gallery generated by Sphinx-Gallery