# About CellRank#

CellRank is a unified solution for the probabilistic description of cellular dynamics, encompassing various input data modalities and analysis scenarios through one consistent application user interface (API). If you find CellRank useful for your research, please check out citing CellRank.

## Design principles#

Our framework is based on three key principles: Robustness, Modularity and Sparsity.

### Robustness#

Fate restriction is a gradual, noisy process requiring probabilistic treatment. Therefore, we use Markov chains to
describe stochastic fate transitions, where each cell represents one state in the Markov chain. Markov chains describe
memoryless transitions between system states through a probabilistic framework. Transition probabilities are summarized
in a transition matrix \(T\), where \(T_{ij}\), describes the probability of transitioning from state \(i\)
to state \(j\) in one step. In our context, each state corresponds to a cell. Markov chains are established tools in
single-cell genomics and form the basis of many successful pseudotime approaches [Haghverdi *et al.*, 2016, Setty *et al.*, 2019].

By using Markov chains, we assume that cellular state transitions occur gradually and without memory. The former
assumption implies that cells change their molecular state in small steps with many intermediate states which are
captured in the data. This is a reasonable assumption for most biological systems. The latter assumption implies that a
state transition depends only on the current molecular state and not on the history of states. This assumption is valid
as CellRank describes average cellular dynamics, rather than any individual cell. Both assumptions form the basis of
many of the previous successful trajectory inference approaches [Haghverdi *et al.*, 2016, Setty *et al.*, 2019, Wolf *et al.*, 2019].

### Modularity#

A typical CellRank workflow consists of two steps: **(i)** estimating cell-cell transition probabilities to set up a
Markov transition matrix \(T\), and **(ii)** analyzing it using various tools to derive biological insights.
Decoupling these two steps yields a powerful and flexible modeling framework as many analysis steps are independent
of the construction of the transition matrix. For example, whether we use RNA velocity or a pseudotime to derive
directed transition probabilities does not change how initial and terminal states are inferred or fate probabilities
estimated. The general structure of the framework, corresponding to steps **(i)** and **(ii)**, is given by:

Kernels that take multi-view single cell input data and estimate a matrix of cell-cell transition probabilities \(T\). Row \(i\) in matrix \(T\) contains the transition probabilities from cell \(i\) towards putative descendants. Therefore, all entries in the matrix are between 0 and 1, and rows sum to one.

Estimators that take a cell-cell transition matrix \(T\) computed using any kernel and apply concepts from the theory of Markov chains to identify initial, terminal, and intermediate macrostates and compute fate probabilities.

Our main (and recommended!) estimator is based on *Generalized Perron Cluster Cluster Analysis* (GPCCA)
[Reuter *et al.*, 2019, Reuter *et al.*, 2018], a method originally developed to study molecular dynamics. CellRank uses a
robust implementation of GPCCA through the pyGPCCA package. Please don’t forget to cite both CellRank and GPCCA when
using the `cellrank.estimators.GPCCA`

estimator, see citing CellRank.

We use fate probabilities to visualize trajectory-specific
gene expression trends, infer putative
driver genes, arrange cells in a
`circular embedding`

[Velten *et al.*, 2017], visualize
`cascades of gene activation`

along a trajectory, and
`cluster expression trends`

.

### Sparsity#

All CellRank kernels yield sparse transition matrices \(T\). Further, the `cellrank.estimators.GPCCA`

estimator exploits sparsity in all major computations. Sparsity allows CellRank to scale to large datasets.

## Why is it called “CellRank”?#

CellRank **does not** rank cells, we gave the package this name because just like Google’s original PageRank
algorithm, it works with Markov chains to aggregate relationships between individual objects (cells vs. websites)
to learn about more global properties of the underlying dynamics (initial & terminal states and fate probabilities vs.
website relevance).