Compute potential lineage drivers

This example shows how to compute and plot expression trends for genes which may be involved in lineage decisions.

We identify these by correlating gene expression with absorption probabilities towards a specific terminal state.

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

We need to compute the absorption probabilities. In this example, we’re using cellrank.tl.estimators.GPCCA estimator to estimate the terminal states of the process, but cellrank.tl.estimators.CFLARE can be used as well.

In detail guide for cellrank.tl.estimators.GPCCA estimator can be found in Compute terminal states using GPCCA.

g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")
g.set_terminal_states_from_macrostates(["Alpha", "Beta", "Epsilon"])
g.compute_absorption_probabilities()
g.absorption_probabilities

Out:

INFO: Using pre-computed schur decomposition
AlphaBetaEpsilon
0.1466080.7511900.102202
0.2466780.5719760.181347
0.1418830.7458260.112291
0.7123660.1170230.170610
0.0368380.9229560.040206
0.0492700.9233600.027369
0.0000790.9998940.000027
0.8976250.0435290.058845
0.1403660.7422610.117373
0.1403330.7422050.117462
.........
0.1403350.7421800.117485
0.0390730.9347560.026171
0.0496540.0341960.916150
0.9494910.0160120.034497
0.0000001.0000000.000000
0.1390630.7341780.126759
0.0498790.9284190.021703
0.0735460.8752740.051180
0.1168700.7963420.086788
0.9232390.0226710.054090

2531 cells x 3 lineages



To compute the potential driver genes, simply call the cellrank.tl.estimators.BaseEstimator.compute_lineage_drivers() method. By default, the these are computed for all lineages. We can restrict this computation to only a few clusters, using the cluster_key and clusters parameters.

g.compute_lineage_drivers()
g.lineage_drivers
Alpha Beta Epsilon
index
Snhg6 -0.041896 0.049168 -0.025309
Ncoa2 0.043024 -0.028959 -0.009451
Stau2 0.010126 -0.041692 0.055179
Uggt1 -0.074903 0.081750 -0.035117
Tmem131 0.023332 -0.017514 -0.002146
... ... ... ...
Tmem27 0.422637 -0.469271 0.211330
Uty 0.008632 -0.000096 -0.011302
Ddx3y 0.009119 -0.011621 0.007022
Eif2s3y -0.016992 0.013547 0.000259
Erdr1 0.039715 -0.038890 0.011286

2000 rows × 3 columns



Lastly, we plot the top 3 potential driver genes for the “Alpha” lineage.

g.plot_lineage_drivers("Alpha", n_genes=3)
Gcg, Irx2, Peg10

Total running time of the script: ( 0 minutes 22.647 seconds)

Estimated memory usage: 597 MB

Gallery generated by Sphinx-Gallery