Compute absorption probabilities

This example shows how to compute and plot the absorption probabilities and the time to absorption.

Absorption probabilities are used in CellRank to define how likely each individual cell is to transition into each of the identified terminal states. In a usual workflow, you would first compute a set of terminal states for your dataset and next ask the question how likely each cell is to develop towards each of these states. CellRank provides an efficient implementation of computing the absorption probabilities that scales to 100k+ cells.

import scvelo as scv

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

We need to compute or set the terminal states. In detail guide for both of our estimators can be found here Compute terminal states using GPCCA.

g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")
g.set_terminal_states_from_macrostates(["Alpha", "Beta", "Epsilon"])

cellrank.tl.estimators.BaseEstimator.compute_absorption_probabilities() easily scales to 100k+ cells, thanks to the linear solvers from PETSc.

The computation of absorption probabilities may be restricted to a subset of the identified states via the keys parameter. In our case, we are interested in the absorption probabilities towards each of the terminal states we identified in our data.

g.compute_absorption_probabilities()

Out:

  0%|          | 0/90 [00:00<?, ?/s]
  7%|6         | 6/90 [00:00<00:01, 45.70/s]
 16%|#5        | 14/90 [00:00<00:01, 54.18/s]
 24%|##4       | 22/90 [00:00<00:01, 57.14/s]
 33%|###3      | 30/90 [00:00<00:01, 58.22/s]
 42%|####2     | 38/90 [00:00<00:00, 52.09/s]
 50%|#####     | 45/90 [00:00<00:00, 49.58/s]
 58%|#####7    | 52/90 [00:01<00:00, 48.01/s]
 66%|######5   | 59/90 [00:01<00:00, 47.16/s]
 74%|#######4  | 67/90 [00:01<00:00, 50.08/s]
 83%|########3 | 75/90 [00:01<00:00, 52.86/s]
 92%|#########2| 83/90 [00:01<00:00, 54.95/s]
100%|##########| 90/90 [00:01<00:00, 52.91/s]

The absorption probabilities can be inspected as seen below. Curious reader is encouraged to take a look at some niche tricks for cellrank.tl.Lineage in Lineage tricks.

g.absorption_probabilities
AlphaBetaEpsilon
0.1466070.7511890.102202
0.2466770.5719750.181346
0.1418820.7458250.112291
0.7123660.1170230.170610
0.0368380.9229560.040205
0.0492700.9233600.027369
0.0000790.9998940.000027
0.8976250.0435290.058845
0.1403660.7422590.117373
0.1403330.7421970.117462
.........
0.1403350.7421770.117485
0.0390730.9347560.026171
0.0496530.0341960.916150
0.9494910.0160120.034497
0.0000001.0000000.000000
0.1390620.7341760.126759
0.0498790.9284180.021702
0.0735460.8752730.051180
0.1168690.7963410.086788
0.9232390.0226710.054090

2531 cells x 3 lineages



We can now plot the absorption probabilities. We can use parameters like same_plot or discrete to control whether to plot each lineage in a separate plot or show only the top n_cells - this is set when computing/setting the terminal states. By default, 30 cells are selected from each terminal state.

g.plot_absorption_probabilities()
absorption probabilities (fwd)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/cellrank/envs/stable/lib/python3.8/site-packages/scvelo/plotting/utils.py:115: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
  ax = pl.figure(None, figsize, dpi=dpi).gca(projection=projection)

cellrank.tl.estimators.BaseEstimator.compute_absorption_probabilities() can also be used to compute the mean and the variance of time to absorption to all or just to a subset of terminal states.

This can be specified by supplying time_to_absorption parameter. Below we compute only the mean time to absorption to all terminal states. To compute the mean and the variance only for the “Alpha” absorbing state, one specify the following time_to_absorption={"Alpha": "var"}.

g.compute_absorption_probabilities(time_to_absorption="all")
g.lineage_absorption_times

Out:

  0%|          | 0/90 [00:00<?, ?/s]
  8%|7         | 7/90 [00:00<00:01, 50.04/s]
 17%|#6        | 15/90 [00:00<00:01, 55.95/s]
 26%|##5       | 23/90 [00:00<00:01, 57.72/s]
 34%|###4      | 31/90 [00:00<00:01, 57.70/s]
 43%|####3     | 39/90 [00:00<00:00, 51.80/s]
 51%|#####1    | 46/90 [00:00<00:00, 49.88/s]
 59%|#####8    | 53/90 [00:01<00:00, 48.34/s]
 67%|######6   | 60/90 [00:01<00:00, 47.44/s]
 76%|#######5  | 68/90 [00:01<00:00, 51.16/s]
 84%|########4 | 76/90 [00:01<00:00, 53.25/s]
 93%|#########3| 84/90 [00:01<00:00, 55.27/s]
100%|##########| 90/90 [00:01<00:00, 53.30/s]

  0%|          | 0/1 [00:00<?, ?/s]
100%|##########| 1/1 [00:00<00:00, 25.08/s]
Alpha, Beta, Epsilon mean
index
AAACCTGAGAGGGATA-1-3 19.182785
AAACCTGAGGCAATTA-1-3 21.382957
AAACCTGGTAAGTGGC-1-3 32.705402
AAACCTGTCCCTCTTT-1-3 10.673669
AAACGGGAGTAGCGGT-1-3 14.852581
... ...
TTTGGTTTCCTTTCGG-1-3 11.854834
TTTGTCAAGAATGTGT-1-3 16.565882
TTTGTCAAGTGACATA-1-3 20.929073
TTTGTCATCGAATGCT-1-3 4.049922
TTTGTCATCTGTTTGT-1-3 27.014696

2531 rows × 1 columns



Lastly, we plot the above computed time.

adata.obs["mean_time_to_absorption"] = g.lineage_absorption_times[
    "Alpha, Beta, Epsilon mean"
]
scv.pl.scatter(adata, color="mean_time_to_absorption")
mean time to absorption

Out:

/home/docs/checkouts/readthedocs.org/user_builds/cellrank/envs/stable/lib/python3.8/site-packages/scvelo/plotting/utils.py:115: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
  ax = pl.figure(None, figsize, dpi=dpi).gca(projection=projection)

Total running time of the script: ( 0 minutes 24.732 seconds)

Estimated memory usage: 711 MB

Gallery generated by Sphinx-Gallery