Note
Click here to download the full example code
Plot gene trends
This example shows how to plot smoothed gene expression toward specific terminal populations.
By default, we use Generalized Additive Models (GAMs) to fit gene expression values and we specify each cell’s contribution to each lineage via the lineage probabilities.
For models based on sklearn
estimators, see cellrank.ul.models.SKLearnModel
.
import cellrank as cr
adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata
Out:
AnnData object with n_obs × n_vars = 2531 × 2000
obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
varm: 'PCs', 'loss'
layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
obsp: 'connectivities', 'distances'
First, we compute the terminal states and the absorption probabilities towards them. The absorption probabilities will be used as weights in the loss function when fitting the GAMs.
We further set up a model instance that will be used for smoothing.
cr.tl.terminal_states(
adata,
cluster_key="clusters",
weight_connectivities=0.2,
n_states=3,
softmax_scale=4,
show_progress_bar=False,
)
cr.tl.lineages(adata)
model = cr.ul.models.GAM(adata)
Out:
/home/docs/checkouts/readthedocs.org/user_builds/cellrank/checkouts/latest/examples/plotting/plot_gene_trends.py:23: DeprecationWarning: `cellrank.tl.terminal_states` will be removed in version `2.0`. Please use the `cellrank.kernels` or `cellrank.estimators` interface instead.
cr.tl.terminal_states(
/home/docs/checkouts/readthedocs.org/user_builds/cellrank/checkouts/latest/cellrank/tl/_init_term_states.py:158: DeprecationWarning: `cellrank.tl.transition_matrix` will be removed in version `2.0`. Please use the `cellrank.kernels` or `cellrank.estimators` interface instead.
kernel = transition_matrix(
/home/docs/checkouts/readthedocs.org/user_builds/cellrank/checkouts/latest/examples/plotting/plot_gene_trends.py:31: DeprecationWarning: `cellrank.tl.lineages` will be removed in version `2.0`. Please use the `cellrank.kernels` or `cellrank.estimators` interface instead.
cr.tl.lineages(adata)
0%| | 0/3 [00:00<?, ?/s]
100%|##########| 3/3 [00:00<00:00, 36.14/s]
To plot the trends for some genes, run the code below. The parameter data_key
specifies layer in adata.layers
from which we obtain gene expression - in this case, we use moments of gene expression computed by
scvelo.pp.moments()
.
cr.pl.gene_trends(
adata,
model,
["Map2", "Dcx"],
data_key="Ms",
time_key="dpt_pseudotime",
show_progress_bar=False,
)

Out:
0%| | 0/2 [00:00<?, ?gene/s]
50%|##### | 1/2 [00:00<00:00, 5.70gene/s]
100%|##########| 2/2 [00:00<00:00, 6.27gene/s]
100%|##########| 2/2 [00:00<00:00, 6.16gene/s]
We can plot all trends in the same plot and hide the cells, as shown below.
cr.pl.gene_trends(
adata,
model,
["Map2", "Dcx"],
data_key="Ms",
same_plot=True,
hide_cells=True,
time_key="dpt_pseudotime",
show_progress_bar=False,
)

Out:
0%| | 0/2 [00:00<?, ?gene/s]
50%|##### | 1/2 [00:00<00:00, 5.70gene/s]
100%|##########| 2/2 [00:00<00:00, 6.25gene/s]
100%|##########| 2/2 [00:00<00:00, 6.14gene/s]
We can specify specific pseudotime ranges, separately for each lineage - the model is always fitted using cells with the given time range. By default, the start is always 0 and the end is automatically determined based on the absorption probabilities.
cr.pl.gene_trends(
adata,
model,
["Map2", "Dcx"],
data_key="Ms",
lineages=["Alpha", "Beta"],
time_range=[(0.2, 1), (0, 0.8)],
time_key="dpt_pseudotime",
show_progress_bar=False,
)

Out:
0%| | 0/2 [00:00<?, ?gene/s]
100%|##########| 2/2 [00:00<00:00, 10.24gene/s]
100%|##########| 2/2 [00:00<00:00, 10.17gene/s]
We can also return the models, which can be useful to inspect the fitted models more granular or when the
fitting has failed - such models will be returned as cellrank.ul.models.FailedModel
.
Below we show what would happen if a model were to fail for some arbitrary gene and lineage combination.
failed_model = cr.ul.models.FailedModel(model, exc="This is just a dummy example.")
models = cr.pl.gene_trends(
adata,
{
"Map2": {"Alpha": failed_model, "*": model},
"Dcx": {"Beta": failed_model, "*": model},
},
["Map2", "Dcx"],
data_key="Ms",
time_key="dpt_pseudotime",
show_progress_bar=False,
return_models=True,
)
models["Map2"]["Alpha"]

Out:
0%| | 0/2 [00:00<?, ?gene/s]
100%|##########| 2/2 [00:00<00:00, 9.25gene/s]
100%|##########| 2/2 [00:00<00:00, 9.18gene/s]
<FailedModel[origin=GAM[gene=None, lineage=None, model=GammaGAM(callbacks=['deviance', 'diffs'], fit_intercept=True, max_iter=2000, scale=None, terms=s(0), tol=0.0001, verbose=False)]]>
There are many more options as how to customize the plot or how to pass additional arguments to
cellrank.ul.models.BaseModel.prepare()
and we encourage the reader to take a closer look at the documentation of
cellrank.pl.gene_trends()
and the above mentioned method.
Total running time of the script: ( 0 minutes 22.379 seconds)
Estimated memory usage: 793 MB