Compute macrostates

This example shows how to compute and plot the macrostates.

For the computation of macrostates, we adapted the popular Generalized Perron Cluster Cluster Analysis [Reuter et al., 2018] [Reuter et al., 2019] method to the single cell context. We provide a scalable implementation which can decompose datasets of 100k+ cells into their dominant dynamical macrostates in just a few minutes. GPCCA relies on the real Schur decomposition to handle non-symmetric transition matrices as they arise from RNA velocity information.

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/cellrank/checkouts/stable/examples/estimators/compute_macrostates.py:20: DeprecationWarning: `cellrank.tl.transition_matrix` will be removed in version `2.0`. Please use the `cellrank.kernels` or `cellrank.estimators` interface instead.
  k = cr.tl.transition_matrix(

First, we need to compute the Schur vectors. By default, the first 10 vectors are computed.

g.compute_schur(n_components=4)

Out:

DEBUG:root:The Schur vectors aren't D-orthogonal so they are D-orthogonalized.

We can now compute the macrostates of the Markov chain. The first important parameter is the cluster_key, which tries to associate the names of macrostates with the cluster labels. The second import parameter is n_cells, which selects the top cells from each state based on the membership degree. By default, 30 cells are selected.

Lastly, the parameter n_states can also be estimated by using either the eigengap or the minChi criterion from [Reuter et al., 2018].

g.compute_macrostates(n_states=3, cluster_key="clusters")

Out:

INFO:root:Using pre-computed Schur decomposition

After computing the macrostates, we can inspect them as follows. Below we show for each cell the membership degree of the macrostates.

g.macrostates_memberships
EpsilonAlphaBeta
0.1336230.1668880.699488
0.2290940.2798360.491069
0.1698190.1801750.650006
0.1480520.7808450.071103
0.0490440.0424630.908492
0.0337220.0559320.910346
0.0000240.0000970.999878
0.0303600.9484500.021190
0.2061010.1940520.599847
0.2670990.2279820.504919
.........
0.2177890.2011180.581093
0.0333010.0442590.922440
0.9327060.0531250.014169
0.0257310.9683620.005907
0.0000000.0000080.999991
0.2150480.1795700.605382
0.0244580.0571850.918357
0.0673580.0848990.847743
0.1187450.1375990.743656
0.0385570.9532140.008229

2531 cells x 3 lineages



To get the categorical observations, top n_cells for each macrostate, we can inspect the attribute below.

g.macrostates

Out:

index
AAACCTGAGAGGGATA-1-3    NaN
AAACCTGAGGCAATTA-1-3    NaN
AAACCTGGTAAGTGGC-1-3    NaN
AAACCTGTCCCTCTTT-1-3    NaN
AAACGGGAGTAGCGGT-1-3    NaN
                       ...
TTTGGTTTCCTTTCGG-1-3    NaN
TTTGTCAAGAATGTGT-1-3    NaN
TTTGTCAAGTGACATA-1-3    NaN
TTTGTCATCGAATGCT-1-3    NaN
TTTGTCATCTGTTTGT-1-3    NaN
Length: 2531, dtype: category
Categories (3, object): ['Epsilon', 'Alpha', 'Beta']

We can now plot the membership degree, as well as the categorical assignment.

g.plot_macrostates()
g.plot_macrostates(discrete=True)
  • macrostates memberships
  • macrostates

Out:

DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved

Both of these options are shown in the same plot, which is not always desirable. To change this, simply run the following.

g.plot_macrostates(same_plot=False)
macrostates memberships Epsilon, macrostates memberships Alpha, macrostates memberships Beta

Out:

DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f41b849c5e0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f41b849c5e0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.MaxNLocator object at 0x7f41b849c550>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f41b8191cd0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f41b8191cd0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.MaxNLocator object at 0x7f41b81917c0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f41b83c14c0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.AutoLocator object at 0x7f41b83c14c0>
DEBUG:matplotlib.colorbar:locator: <matplotlib.ticker.MaxNLocator object at 0x7f41b83c1d30>
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved
DEBUG:matplotlib.axes._base:top of Axes not in the figure, so title not moved

Method cellrank.tl.estimators.GPCCA.compute_macrostates() also computes the coarse-grained transition matrix between the macrostates, see Compute coarse-grained transition matrix.

Total running time of the script: ( 0 minutes 18.789 seconds)

Estimated memory usage: 709 MB

Gallery generated by Sphinx-Gallery