Compute potential lineage drivers

This example shows how to compute and plot expression trends for genes which may be involved in lineage decisions.

We identify these by correlating gene expression with absorption probabilities towards a specific terminal state.

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

We need to compute the absorption probabilities. In this example, we’re using cellrank.tl.estimators.GPCCA estimator to estimate the terminal states of the process, but cellrank.tl.estimators.CFLARE can be used as well.

In detail guide for cellrank.tl.estimators.GPCCA estimator can be found in Compute terminal states using GPCCA.

g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")
g.set_terminal_states_from_macrostates(["Alpha", "Beta", "Epsilon"])
g.compute_absorption_probabilities()
g.absorption_probabilities

Out:

INFO: Using pre-computed schur decomposition
AlphaBetaEpsilon
0.1466080.7511900.102202
0.2466780.5719760.181347
0.1418830.7458260.112291
0.7123660.1170230.170610
0.0368380.9229560.040206
0.0492700.9233600.027369
0.0000790.9998940.000027
0.8976250.0435290.058845
0.1403660.7422610.117373
0.1403330.7422050.117462
.........
0.1403350.7421800.117485
0.0390730.9347560.026171
0.0496540.0341960.916150
0.9494910.0160120.034497
0.0000001.0000000.000000
0.1390630.7341780.126759
0.0498790.9284190.021703
0.0735460.8752740.051180
0.1168700.7963420.086788
0.9232390.0226710.054090

2531 cells x 3 lineages



To compute the potential driver genes, simply call the cellrank.tl.estimators.BaseEstimator.compute_lineage_drivers() method. By default, the these are computed for all lineages. We can restrict this computation to only a few clusters, using the cluster_key and clusters parameters. We also compute the corrected p-values (qval) and the 95% confidence intervals for the correlations.

g.compute_lineage_drivers(lineages="Alpha")
g.lineage_drivers.sort_values("Alpha corr", ascending=False)
Alpha corr Alpha pval Alpha qval Alpha ci low Alpha ci high
index
Gcg 0.866438 0.000000e+00 0.000000e+00 0.856386 0.875833
Irx2 0.519357 4.506551e-184 4.506551e-181 0.490317 0.547246
Peg10 0.515331 1.302594e-180 8.683959e-178 0.486130 0.543383
Wnk3 0.482853 1.564546e-154 7.822729e-152 0.452402 0.512179
Tmsb15l 0.456772 9.195784e-136 3.678314e-133 0.425381 0.487066
... ... ... ... ... ...
Nkx6-1 -0.313467 8.400535e-60 7.636850e-58 -0.348176 -0.277899
Nnat -0.318801 6.115121e-62 6.436970e-60 -0.353374 -0.283359
Gng12 -0.324276 3.496286e-64 4.113277e-62 -0.358706 -0.288965
Pdx1 -0.330744 6.755869e-67 8.444837e-65 -0.365002 -0.295591
Ptma -0.373322 1.357200e-86 2.467637e-84 -0.406373 -0.339296

2000 rows × 5 columns



Lastly, we plot the top 3 potential driver genes for the “Alpha” lineage.

g.plot_lineage_drivers("Alpha", n_genes=3)
Gcg q-value=0.0000e+00, Irx2 q-value=4.5066e-181, Peg10 q-value=8.6840e-178

Total running time of the script: ( 0 minutes 22.673 seconds)

Estimated memory usage: 597 MB

Gallery generated by Sphinx-Gallery