Note
Click here to download the full example code
Compute potential lineage drivers
This example shows how to compute and plot expression trends for genes which may be involved in lineage decisions.
We identify these by correlating gene expression with absorption probabilities towards a specific terminal state.
import cellrank as cr
adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata
Out:
AnnData object with n_obs × n_vars = 2531 × 2000
obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
varm: 'PCs', 'loss'
layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
obsp: 'connectivities', 'distances'
First, we prepare the kernel using the high-level pipeline and the cellrank.tl.estimators.GPCCA
estimator.
k = cr.tl.transition_matrix(
adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)
Out:
/home/docs/checkouts/readthedocs.org/user_builds/cellrank/checkouts/latest/examples/estimators/compute_lineage_drivers.py:17: DeprecationWarning: `cellrank.tl.transition_matrix` will be removed in version `2.0`. Please use the `cellrank.kernels` or `cellrank.estimators` interface instead.
k = cr.tl.transition_matrix(
We need to compute the absorption probabilities. In this example, we’re using
cellrank.tl.estimators.GPCCA
estimator to estimate the terminal states of the process, but
cellrank.tl.estimators.CFLARE
can be used as well.
In detail guide for cellrank.tl.estimators.GPCCA
estimator can be found in
Compute terminal states using GPCCA.
g.compute_schur(n_components=4)
g.compute_macrostates(cluster_key="clusters")
g.set_terminal_states_from_macrostates(["Alpha", "Beta", "Epsilon"])
g.compute_absorption_probabilities()
g.absorption_probabilities
Out:
0%| | 0/3 [00:00<?, ?/s]
100%|##########| 3/3 [00:00<00:00, 34.46/s]
To compute the potential driver genes, simply call the
cellrank.tl.estimators.BaseEstimator.compute_lineage_drivers()
method. By default, the these are computed for
all lineages. We can restrict this computation to only a few clusters, using the cluster_key
and clusters
parameters.
We also compute the corrected p-values (qval) and the 95% confidence intervals for the correlations.
g.compute_lineage_drivers(lineages="Alpha")
g.lineage_drivers.sort_values("Alpha_corr", ascending=False)
Lastly, we plot the top 3 potential driver genes for the “Alpha” lineage.
g.plot_lineage_drivers("Alpha", n_genes=3)

Total running time of the script: ( 0 minutes 17.684 seconds)
Estimated memory usage: 785 MB