CellRank
latest

General

  • Installation
  • API
  • Classes
  • Release Notes
  • References

Gallery

  • Examples
    • Estimators
    • Plotting
    • Others
      • Fit models and plot gene trends
      • Lineage tricks
      • Kernel tricks

Tutorials

  • Pancreas Basics
  • Pancreas Advanced
CellRank
  • »
  • Examples »
  • Fit models and plot gene trends
  • Edit on GitHub

Note

Click here to download the full example code

Fit models and plot gene trends¶

This example shows how to prepare, fit and plot various models to gene expression trends.

We will focus mostly on Generalized Additive Models (GAMs) and show how to do this for sklearn estimators towards the end. GAMs are flexible models that are well suited to model non-linear gene trends as they often appear in single-cell data. Further, they have the advantage that it is relatively straightforward to derive a confidence interval around the main trend.

from sklearn.svm import SVR

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we need to compute a set of terminal states and we have to estimate fate probabilities towards these. The fate probabilities will be used as weights when fitting the model - they determine how much each cell contributes to each lineage.

cr.tl.terminal_states(
    adata,
    cluster_key="clusters",
    weight_connectivities=0.2,
    softmax_scale=4,
    show_progress_bar=False,
)
cr.tl.lineages(adata)

Out:

INFO: Using pre-computed schur decomposition

Models in cellrank.ul.models follow similar patterns as sklearn models. We begin by initializing and preparing the model for fitting. cellrank.ul.models.BaseModel.prepare() requires only the gene name and the lineage name and must be called before cellrank.ul.models.BaseModel.fit(). It also includes various useful parameters, such as time_range or weight_threshold, which determine the start and end pseudotime and the minimum required threshold for lineage probabilities, respectively.

model = cr.ul.models.GAM(adata)
model.prepare(
    gene="Pak3",
    lineage="Alpha",
    time_key="dpt_pseudotime",
    data_key="Ms",
    n_test_points=100,
)

Out:

<GAM[gene='Pak3', lineage='Alpha', model=GammaGAM(callbacks=['deviance', 'diffs'], fit_intercept=True, max_iter=2000, scale=None, terms=s(0), tol=0.0001, verbose=False)]>

CellRank allows any pseudotime from adata.obs to be passed via the time_key, e.g. Diffusion Pseudotime (DPT) [Haghverdi16], scvelo’s latent time [Bergen20], Palantir’s pseudotime [Setty19], etc.

Further, CellRank accepts imputed gene expression values stored in adata.layers via the data_key, i.e. you can pass MAGIC [MAGIC18] imputed data, scvelo.pp.moments() [Bergen20] (used below) or any other form of imputation.

Once the model has been prepared, it is ready for fitting and prediction.

y_hat = model.fit().predict()
y_hat

Out:

array([0.04188798, 0.04126436, 0.04068079, 0.0401372 , 0.03963354,
       0.03916981, 0.03874606, 0.03836239, 0.038019  , 0.03771614,
       0.03745415, 0.03723348, 0.03705466, 0.03691835, 0.03682534,
       0.03677654, 0.03677303, 0.03681602, 0.03690694, 0.0370474 ,
       0.03723921, 0.03748443, 0.03778539, 0.03814469, 0.03856524,
       0.03905032, 0.03960357, 0.04022907, 0.04093137, 0.04171554,
       0.04258723, 0.04355275, 0.04461914, 0.04579423, 0.04708616,
       0.04850137, 0.05004637, 0.05172833, 0.05355516, 0.05553548,
       0.05767874, 0.05999519, 0.06249599, 0.06519324, 0.06809998,
       0.07123035, 0.07459959, 0.07822409, 0.0821215 , 0.08631077,
       0.09081224, 0.09564777, 0.10084067, 0.1064159 , 0.11240011,
       0.11882179, 0.12571123, 0.13310066, 0.14102435, 0.1495185 ,
       0.15862171, 0.16837463, 0.17882016, 0.19000345, 0.2019722 ,
       0.21477619, 0.2284676 , 0.24310021, 0.25872743, 0.27540293,
       0.29318136, 0.31211773, 0.3322675 , 0.3536854 , 0.3764255 ,
       0.40054053, 0.42608052, 0.4530935 , 0.48162335, 0.5117095 ,
       0.54338586, 0.5766802 , 0.61161226, 0.6481933 , 0.6864243 ,
       0.7262959 , 0.76778543, 0.810857  , 0.8554596 , 0.9015258 ,
       0.9489717 , 0.9976943 , 1.0475708 , 1.0984584 , 1.1501942 ,
       1.2025931 , 1.2554485 , 1.308532  , 1.3615954 , 1.4143687 ],
      dtype=float32)

Optionally, we can also get the confidence interval. Models which don’t have a method to compute it, such as cellrank.ul.models.SKLearnModel wrapper for some sklearn estimators, can use the default the confidence interval cellrank.ul.models.BaseModel.default_confidence_interval().

conf_int = model.confidence_interval()
conf_int[:5]

Out:

array([[0.03875637, 0.04527263],
       [0.03821021, 0.04456262],
       [0.03770048, 0.0438967 ],
       [0.03722708, 0.04327481],
       [0.03678997, 0.0426969 ]], dtype=float32)

After the prediction and optionally the confidence interval calculation, we can plot the smoothed gene expression.

Cells in this plot have been colored by their fate probability of reaching the Alpha terminal state. We include these probabilities as weights in the loss function when fitting the model. This allows us to weight each cell by its relative contribution to the lineage, without needing to subset cells for each lineage.

model.plot(conf_int=True)
Pak3 @ Alpha

Lastly, wrapping sklearn estimators is fairly simple, we just pass the instance to cellrank.ul.models.SKLearnModel.

svr = SVR()
model = cr.ul.models.SKLearnModel(adata, model=svr)
model

Out:

<SKLearnModel[gene=None, lineage=None, model=SVR()]>

Total running time of the script: ( 0 minutes 39.544 seconds)

Estimated memory usage: 536 MB

Download Python source code: plot_model.py

Download Jupyter notebook: plot_model.ipynb

Gallery generated by Sphinx-Gallery

Next Previous

© Copyright 2021, Marius Lange, Michal Klein.. Revision 62391589.

Read the Docs v: latest
Versions
latest
Downloads
html
On Read the Docs
Project Home
Builds