Compute absorption probabilities

This example shows how to compute and plot the absorption probabilities and the time to absorption.

Absorption probabilities are used in CellRank to define how likely each individual cell is to transition into each of the identified terminal states. In a usual workflow, you would first compute a set of terminal states for your dataset and next ask the question how likely each cell is to develop towards each of these states. CellRank provides an efficient implementation of computing the absorption probabilities that scales to 100k+ cells.

import cellrank as cr
import scvelo as scv

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

We need to compute or set the terminal states. In detail guide for both of our estimators can be found here Compute terminal states using GPCCA.

g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")
g.set_terminal_states_from_macrostates(["Alpha", "Beta", "Epsilon"])

Out:

INFO: Using pre-computed schur decomposition

cellrank.tl.estimators.BaseEstimator.compute_absorption_probabilities() easily scales to 100k+ cells, thanks to the linear solvers from PETSc.

The computation of absorption probabilities may be restricted to a subset of the identified states via the keys parameter. In our case, we are interested in the absorption probabilities towards each of the terminal states we identified in our data.

g.compute_absorption_probabilities()

The absorption probabilities can be inspected as seen below. Curious reader is encouraged to take a look at some niche tricks for cellrank.tl.Lineage in Lineage tricks.

g.absorption_probabilities
AlphaBetaEpsilon
0.1466080.7511900.102202
0.2466780.5719760.181347
0.1418830.7458260.112291
0.7123660.1170230.170610
0.0368380.9229560.040206
0.0492700.9233600.027369
0.0000790.9998940.000027
0.8976250.0435290.058845
0.1403660.7422610.117373
0.1403330.7422050.117462
.........
0.1403350.7421800.117485
0.0390730.9347560.026171
0.0496540.0341960.916150
0.9494910.0160120.034497
0.0000001.0000000.000000
0.1390630.7341780.126759
0.0498790.9284190.021703
0.0735460.8752740.051180
0.1168700.7963420.086788
0.9232390.0226710.054090

2531 cells x 3 lineages



We can now plot the absorption probabilities. We can use parameters like same_plot or discrete to control whether to plot each lineage in a separate plot or show only the top n_cells - this is set when computing/setting the terminal states. By default, 30 cells are selected from each terminal state.

g.plot_absorption_probabilities()
absorption probabilities (fwd)

cellrank.tl.estimators.BaseEstimator.compute_absorption_probabilities() can also be used to compute the mean and the variance of time to absorption to all or just to a subset of terminal states.

This can be specified by supplying time_to_absorption parameter. Below we compute only the mean time to absorption to all terminal states. To compute the mean and the variance only for the “Alpha” absorbing state, one specify the following time_to_absorption={"Alpha": "var"}.

g.compute_absorption_probabilities(time_to_absorption="all")
g.lineage_absorption_times
Alpha, Beta, Epsilon mean
index
AAACCTGAGAGGGATA-1-3 19.182791
AAACCTGAGGCAATTA-1-3 21.382967
AAACCTGGTAAGTGGC-1-3 32.705410
AAACCTGTCCCTCTTT-1-3 10.673672
AAACGGGAGTAGCGGT-1-3 14.852583
... ...
TTTGGTTTCCTTTCGG-1-3 11.854835
TTTGTCAAGAATGTGT-1-3 16.565884
TTTGTCAAGTGACATA-1-3 20.929081
TTTGTCATCGAATGCT-1-3 4.049922
TTTGTCATCTGTTTGT-1-3 27.014730

2531 rows × 1 columns



Lastly, we plot the above computed time.

adata.obs["mean_time_to_absorption"] = g.lineage_absorption_times[
    "Alpha, Beta, Epsilon mean"
]
scv.pl.scatter(adata, color="mean_time_to_absorption")
mean time to absorption

Total running time of the script: ( 0 minutes 24.153 seconds)

Estimated memory usage: 596 MB

Gallery generated by Sphinx-Gallery