cellrank.tl.initial_states

cellrank.tl.initial_states(adata, estimator=<class 'cellrank.tl.estimators._gppca.GPCCA'>, mode='deterministic', backward_mode='transpose', n_states=None, cluster_key=None, key=None, show_plots=False, copy=False, return_estimator=False, fit_kwargs=mappingproxy({}), **kwargs)[source]

Find initial states of a dynamic process of single cells based on RNA velocity [Manno18].

The function models dynamic cellular processes as a Markov chain, where the transition matrix is computed based on the velocity vectors of each individual cell. Based on this Markov chain, we provide two estimators to compute initial states, both of which are based on spectral methods.

For the estimator cellrank.tl.estimators.GPCCA, cells are fuzzily clustered into macrostates, using Generalized Perron Cluster Cluster Analysis [GPCCA18]. In short, this coarse-grains the Markov chain into a set of macrostates representing the slow time-scale dynamics, i.e. transitions between these macrostates are rare. The most stable ones of these will represent initial, while the others represent intermediate macrostates.

For the estimator cellrank.tl.estimators.CFLARE, cells are filtered into transient/recurrent cells using the left eigenvectors of the transition matrix and clustered into distinct groups of initial states using the right eigenvectors of the transition matrix of the Markov chain.

Parameters
  • adata (anndata.AnnData) – Annotated data object.

  • estimator (PropertyMeta) – Estimator class to use to compute the initial states.

  • mode (str) –

    How to compute transition probabilities. Valid options are:

    • ’deterministic’ - deterministic computation that doesn’t propagate uncertainty.

    • ’monte_carlo’ - Monte Carlo average of randomly sampled velocity vectors.

    • ’stochastic’ - second order approximation, only available when jax is installed.

    • ’sampling’ - sample 1 transition matrix from the velocity distribution.

  • backward_mode (str) –

    How to compute the backward transitions. Valid options are:

    • ’transpose’ - compute transitions from neighboring cells j to cell i.

    • ’negate’ - negate the velocity vector.

  • n_states (Optional[int]) – If you know how many initial states you are expecting, you can provide this number. Otherwise, an eigengap heuristic is used.

  • cluster_key (Optional[str]) – Key from adata.obs where cluster annotations are stored. These are used to give names to the initial states.

  • key (Optional[str]) – Key in adata.obsp where the transition matrix is saved. If not found, compute a new one using cellrank.tl.transition_matrix().

  • weight_connectivities – Weight given to a transition matrix computed on the basis of the KNN connectivities. Must be in [0, 1]. This can help in situations where we have noisy velocities and want to give some weight to transcriptomic similarity.

  • show_plots (bool) – Whether to show plots of the spectrum and eigenvectors in the embedding.

  • n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

  • copy (bool) – Whether to update the existing adata object or to return a copy.

  • return_estimator (bool) – Whether to return the estimator. Only available when copy=False.

  • fit_kwargs (Mapping) – Keyword arguments for cellrank.tl.BaseEstimator.fit(), such as n_cells.

  • **kwargs – Keyword arguments for cellrank.tl.transition_matrix(), such as weight_connectivities or softmax_scale.

Returns

Depending on copy and return_estimator, either updates the existing adata object, returns its copy or returns the estimator.

Marked cells are added to adata.obs['initial_states'].

Return type

anndata.AnnData, cellrank.tl.estimators.BaseEstimator or None