Note
Click here to download the full example code
Compute potential lineage drivers¶
This example shows how to compute and plot expression trends for genes which may be involved in lineage decisions.
We identify these by correlating gene expression with absorption probabilities towards a specific terminal state.
import cellrank as cr
adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata
Out:
AnnData object with n_obs × n_vars = 2531 × 2000
obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
varm: 'PCs', 'loss'
layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
obsp: 'connectivities', 'distances'
First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA
estimator.
k = cr.tl.transition_matrix(
adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)
We need to compute the absorption probabilities. In this example, we’re using
cellrank.tl.estimators.GPCCA
estimator to estimate the terminal states of the process, but
cellrank.tl.estimators.CFLARE
can be used as well.
In detail guide for cellrank.tl.estimators.GPCCA
estimator can be found in
Compute terminal states using GPCCA.
g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")
g.set_terminal_states_from_macrostates(["Alpha", "Beta", "Epsilon"])
g.compute_absorption_probabilities()
g.absorption_probabilities
Out:
INFO: Using pre-computed schur decomposition
To compute the potential driver genes, simply call the
cellrank.tl.estimators.BaseEstimator.compute_lineage_drivers()
method. By default, the these are computed for
all lineages. We can restrict this computation to only a few clusters, using the cluster_key
and clusters
parameters.
We also compute the corrected p-values (qval) and the 95% confidence intervals for the correlations.
g.compute_lineage_drivers(lineages="Alpha")
g.lineage_drivers.sort_values("Alpha corr", ascending=False)
Lastly, we plot the top 3 potential driver genes for the “Alpha” lineage.
g.plot_lineage_drivers("Alpha", n_genes=3)

Total running time of the script: ( 0 minutes 22.673 seconds)
Estimated memory usage: 597 MB