Compute absorption probabilities

This example shows how to compute and plot the absorption probabilities and the time to absorption.

Absorption probabilities are used in CellRank to define how likely each individual cell is to transition into each of the identified terminal states. In a usual workflow, you would first compute a set of terminal states for your dataset and next ask the question how likely each cell is to develop towards each of these states. CellRank provides an efficient implementation of computing the absorption probabilities that scales to 100k+ cells.

import scvelo as scv
import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/cellrank/checkouts/latest/examples/estimators/compute_abs_probs.py:21: DeprecationWarning: `cellrank.tl.transition_matrix` will be removed in version `2.0`. Please use the `cellrank.kernels` or `cellrank.estimators` interface instead.
  k = cr.tl.transition_matrix(

We need to compute or set the terminal states. In detail guide for both of our estimators can be found here Compute terminal states using GPCCA.

g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")
g.set_terminal_states_from_macrostates(["Alpha", "Beta", "Epsilon"])

cellrank.tl.estimators.BaseEstimator.compute_absorption_probabilities() easily scales to 100k+ cells, thanks to the linear solvers from PETSc.

The computation of absorption probabilities may be restricted to a subset of the identified states via the keys parameter. In our case, we are interested in the absorption probabilities towards each of the terminal states we identified in our data.

g.compute_absorption_probabilities()

Out:

  0%|          | 0/3 [00:00<?, ?/s]
100%|##########| 3/3 [00:00<00:00, 35.56/s]

The absorption probabilities can be inspected as seen below. Curious reader is encouraged to take a look at some niche tricks for cellrank.tl.Lineage in Lineage tricks.

g.absorption_probabilities
AlphaBetaEpsilon
0.1466080.7511900.102202
0.2466780.5719760.181346
0.1418830.7458260.112291
0.7123660.1170230.170610
0.0368380.9229560.040205
0.0492700.9233600.027369
0.0000790.9998940.000027
0.8976250.0435290.058845
0.1403670.7422610.117373
0.1403320.7422030.117462
.........
0.1403360.7421800.117485
0.0390730.9347560.026171
0.0496540.0341960.916150
0.9494910.0160120.034497
0.0000001.0000000.000000
0.1390630.7341780.126759
0.0498790.9284180.021702
0.0735460.8752740.051180
0.1168700.7963420.086788
0.9232390.0226710.054090

2531 cells x 3 lineages



We can now plot the absorption probabilities. We can use parameters like same_plot or discrete to control whether to plot each lineage in a separate plot or show only the top n_cells - this is set when computing/setting the terminal states. By default, 30 cells are selected from each terminal state.

g.plot_absorption_probabilities()
absorption probabilities

cellrank.tl.estimators.BaseEstimator.compute_absorption_probabilities() can also be used to compute the mean and the variance of time to absorption to all or just to a subset of terminal states.

This can be specified by supplying time_to_absorption parameter. Below we compute only the mean time to absorption to all terminal states. To compute the mean and the variance only for the “Alpha” absorbing state, one specify the following time_to_absorption={"Alpha": "var"}.

g.compute_absorption_probabilities(time_to_absorption="all")
g.absorption_times

Out:

  0%|          | 0/3 [00:00<?, ?/s]
100%|##########| 3/3 [00:00<00:00, 38.72/s]

  0%|          | 0/1 [00:00<?, ?/s]
100%|##########| 1/1 [00:00<00:00, 26.59/s]
Alpha, Beta, Epsilon mean
index
AAACCTGAGAGGGATA-1-3 19.182785
AAACCTGAGGCAATTA-1-3 21.382957
AAACCTGGTAAGTGGC-1-3 32.705402
AAACCTGTCCCTCTTT-1-3 10.673669
AAACGGGAGTAGCGGT-1-3 14.852581
... ...
TTTGGTTTCCTTTCGG-1-3 11.854834
TTTGTCAAGAATGTGT-1-3 16.565882
TTTGTCAAGTGACATA-1-3 20.929073
TTTGTCATCGAATGCT-1-3 4.049922
TTTGTCATCTGTTTGT-1-3 27.014696

2531 rows × 1 columns



Lastly, we plot the above computed time.

adata.obs["mean_time_to_absorption"] = g.absorption_times["Alpha, Beta, Epsilon mean"]
scv.pl.scatter(adata, color="mean_time_to_absorption")
mean time to absorption

Total running time of the script: ( 0 minutes 22.286 seconds)

Estimated memory usage: 693 MB

Gallery generated by Sphinx-Gallery