Fit estimator

This example shows how to fit an estimator in order to compute the lineages.

CellRank is composed of cellrank.tl.kernels and cellrank.tl.estimators. Kernels compute transition matrices on the basis of directed data, given by i.e. RNA velocity [Manno18] [Bergen20] while estimators perform inference making use of kernels.

Here, we show how to fit an estimator, given a kernel. An estimator may be used e.g. to identify initial and terminal states or to compute lineage probabilities towards the terminal states, for each individual cell. The cellrank.tl.estimators.BaseEstimator.fit() method is a combination of individual methods to compute terminal states and absorption probabilities.

To have maximum control over the inference performed, use these individual methods directly.

import cellrank as cr

adata = cr.datasets.pancreas_preprocessed("../example.h5ad")
adata

Out:

AnnData object with n_obs × n_vars = 2531 × 2000
    obs: 'day', 'proliferation', 'G2M_score', 'S_score', 'phase', 'clusters_coarse', 'clusters', 'clusters_fine', 'louvain_Alpha', 'louvain_Beta', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition', 'dpt_pseudotime'
    var: 'highly_variable_genes', 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'fit_r2', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_', 'fit_scaling', 'fit_std_u', 'fit_std_s', 'fit_likelihood', 'fit_u0', 'fit_s0', 'fit_pval_steady', 'fit_steady_u', 'fit_steady_s', 'fit_variance', 'fit_alignment_scaling', 'velocity_genes'
    uns: 'clusters_colors', 'clusters_fine_colors', 'diffmap_evals', 'iroot', 'louvain_Alpha_colors', 'louvain_Beta_colors', 'neighbors', 'pca', 'recover_dynamics', 'velocity_graph', 'velocity_graph_neg', 'velocity_params'
    obsm: 'X_diffmap', 'X_pca', 'X_umap', 'velocity_umap'
    varm: 'PCs', 'loss'
    layers: 'Ms', 'Mu', 'fit_t', 'fit_tau', 'fit_tau_', 'spliced', 'unspliced', 'velocity', 'velocity_u'
    obsp: 'connectivities', 'distances'

First, we prepare the kernel using high-level pipeline and the cellrank.tl.estimators.GPCCA estimator.

k = cr.tl.transition_matrix(
    adata, weight_connectivities=0.2, softmax_scale=4, show_progress_bar=False
)
g = cr.tl.estimators.GPCCA(k)

Afterwards, we simply call cellrank.tl.estimators.BaseEstimator.fit(). It offers a quick and easy way to compute the terminal states and optionally the absorption probabilities by following similar steps as defined in Compute terminal states using GPCCA.

g.fit(n_lineages=3, cluster_key="clusters", compute_absorption_probabilities=True)

Out:

  0%|          | 0/90 [00:00<?, ?/s]
  8%|7         | 7/90 [00:00<00:01, 49.81/s]
 17%|#6        | 15/90 [00:00<00:01, 54.95/s]
 26%|##5       | 23/90 [00:00<00:01, 57.02/s]
 34%|###4      | 31/90 [00:00<00:01, 58.21/s]
 43%|####3     | 39/90 [00:00<00:00, 58.82/s]
 52%|#####2    | 47/90 [00:00<00:00, 59.02/s]
 61%|######1   | 55/90 [00:00<00:00, 59.24/s]
 70%|#######   | 63/90 [00:01<00:00, 57.19/s]
 79%|#######8  | 71/90 [00:01<00:00, 52.36/s]
 87%|########6 | 78/90 [00:01<00:00, 50.02/s]
 94%|#########4| 85/90 [00:01<00:00, 48.43/s]
100%|##########| 90/90 [00:01<00:00, 52.90/s]

In order to verify that the absorption probabilities have been computed, we plot them below.

g.plot_absorption_probabilities()
absorption probabilities (fwd)

Out:

/home/docs/checkouts/readthedocs.org/user_builds/cellrank/envs/latest/lib/python3.8/site-packages/scvelo/plotting/utils.py:115: MatplotlibDeprecationWarning: Calling gca() with keyword arguments was deprecated in Matplotlib 3.4. Starting two minor releases later, gca() will take no keyword arguments. The gca() function should only be used to get the current axes, or if no axes exist, create new axes with default keyword arguments. To create a new axes with non-default arguments, use plt.axes() or plt.subplot().
  ax = pl.figure(None, figsize, dpi=dpi).gca(projection=projection)

Total running time of the script: ( 0 minutes 23.258 seconds)

Estimated memory usage: 711 MB

Gallery generated by Sphinx-Gallery