Plot potential lineage drivers

This example shows how to compute and plot expression trends for genes which may be involved in lineage decisions.

We identify these by correlating gene expression with absorption probabilities towards a specific terminal state.

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we need to compute the terminal states and the absorption probabilities towards them.

cr.tl.terminal_states(
    adata,
    cluster_key="clusters",
    n_cells=30,
    n_states=3,
    softmax_scale=4,
    show_progress_bar=False,
)
cr.tl.lineages(adata)

Out:

  0%|          | 0/90 [00:00<?, ?/s]
  8%|7         | 7/90 [00:00<00:01, 49.88/s]
 17%|#6        | 15/90 [00:00<00:01, 54.45/s]
 26%|##5       | 23/90 [00:00<00:01, 56.41/s]
 34%|###4      | 31/90 [00:00<00:01, 57.42/s]
 43%|####3     | 39/90 [00:00<00:00, 58.03/s]
 52%|#####2    | 47/90 [00:00<00:00, 58.32/s]
 61%|######1   | 55/90 [00:00<00:00, 58.59/s]
 70%|#######   | 63/90 [00:01<00:00, 56.57/s]
 79%|#######8  | 71/90 [00:01<00:00, 51.90/s]
 87%|########6 | 78/90 [00:01<00:00, 49.51/s]
 94%|#########4| 85/90 [00:01<00:00, 47.87/s]
100%|##########| 90/90 [00:01<00:00, 52.29/s]

Once the lineages have been computed, we can compute the potential driver genes for each of them. It is also possible to restrict this computation to just a few clusters, defined by cluster_key and clusters.

By default we are computing the driver genes for all lineages.

drivers = cr.tl.lineage_drivers(adata, lineages="Alpha")
drivers
Alpha corr Alpha pval Alpha qval Alpha ci low Alpha ci high
index
Gcg 0.866438 0.000000e+00 0.000000e+00 0.856386 0.875834
Irx2 0.519356 4.516765e-184 4.516765e-181 0.490316 0.547245
Peg10 0.515331 1.302802e-180 8.685347e-178 0.486130 0.543383
Wnk3 0.482853 1.564685e-154 7.823426e-152 0.452402 0.512179
Tmsb15l 0.456773 9.187267e-136 3.674907e-133 0.425382 0.487067
... ... ... ... ... ...
Nkx6-1 -0.313466 8.407463e-60 7.643148e-58 -0.348175 -0.277898
Nnat -0.318800 6.122791e-62 6.445044e-60 -0.353373 -0.283358
Gng12 -0.324275 3.500417e-64 4.118138e-62 -0.358705 -0.288964
Pdx1 -0.330743 6.763327e-67 8.454159e-65 -0.365001 -0.295590
Ptma -0.373322 1.357458e-86 2.468105e-84 -0.406373 -0.339295

2000 rows × 5 columns



Finally, we can plot the potential drivers. Below we plot top 3 driver genes for the ‘Alpha’ lineage.

cr.pl.lineage_drivers(adata, lineage="Alpha", n_genes=3)
Gcg qval=0.0000e+00, Irx2 qval=4.5168e-181, Peg10 qval=8.6853e-178

Total running time of the script: ( 0 minutes 23.719 seconds)

Estimated memory usage: 720 MB

Gallery generated by Sphinx-Gallery