cellrank.external.kernels.WOTKernel#

class cellrank.external.kernels.WOTKernel(adata, time_key, backward=False, **kwargs)[source]#

Waddington optimal transport kernel from [Schiebinger et al., 2019].

This class requires the wot package, which can be installed as pip install git+https://github.com/broadinstitute/wot.

Parameters:
  • adata (anndata.AnnData) – Annotated data object.

  • backward (bool) – Direction of the process.

  • time_key (str) – Key in anndata.AnnData.obs where experimental time is stored. The experimental time can be of either of a numeric or an ordered categorical type.

  • kwargs (Any) – Keyword arguments for the parent class.

Examples

Workflow:

# import packages
import scanpy as sc
import cellrank as cr
from cellrank.external.kernels import WOTKernel

# load the data
adata = cr.datasets.lung()

# filter, normalize and annotate highly variable genes
sc.pp.filter_genes(adata, min_cells=10)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)

# estimate proliferation and apoptosis from gene sets (see. e.g. WOT tutorial for example lists)
proliferation_genes = ...
apoptosis_genes = ...
sc.tl.score_genes(adata, gene_list=proliferation_genes, score_name='proliferation')
sc.tl.score_genes(adata, gene_list=apoptosis_genes, score_name='apoptosis')

# initialize kernel, estimate initial growth rate based on scores from above
ot = WOTKernel(adata, time_key='day')
ot.compute_initial_growth_rates(proliferation_key='proliferation',
                                apoptosis_key='apoptosis',
                                key_added='initial_growth_rates')

# compute transport maps, aggregate into one single transition matrix
ot.compute_transition_matrix(growth_rate_key='initial_growth_rates', growth_iters=3)

Attributes table#

adata

Annotated data object.

backward

None.

growth_rates

Estimated cell growth rates for each growth rate iteration.

kernels

Underlying base kernels.

params

Parameters which are used to compute the transition matrix.

shape

(n_cells, n_cells).

transition_matrix

Row-normalized transition matrix.

Methods table#

compute_initial_growth_rates([...])

Estimate initial growth rates using a birth-death process as described in [Schiebinger et al., 2019].

compute_transition_matrix([cost_matrices, ...])

Compute transition matrix using Waddington OT [Schiebinger et al., 2019].

copy([deep])

Not implemented.

from_adata(adata, key[, copy])

Read kernel object saved using write_to_adata().

plot_projection([basis, key_added, ...])

Plot transition_matrix as a stream or a grid plot.

plot_random_walks([n_sims, max_iter, seed, ...])

Plot random walks in an embedding.

plot_single_flow(cluster, cluster_key, time_key)

Visualize outgoing flow from a cluster of cells [Mittnenzweig et al., 2021].

read(fname[, adata, copy])

De-serialize self from a file.

write(fname[, write_adata, ext])

Serialize self to a file.

write_to_adata([key, copy])

Write the transition matrix and parameters used for computation to the underlying adata object.

Attributes#

adata#

WOTKernel.adata#

Annotated data object.

Return type:

AnnData

backward#

WOTKernel.backward#

None.

Return type:

Optional[bool]

growth_rates#

WOTKernel.growth_rates#

Estimated cell growth rates for each growth rate iteration.

Return type:

Optional[DataFrame]

kernels#

WOTKernel.kernels#

Underlying base kernels.

Return type:

Tuple[KernelExpression, ...]

params#

WOTKernel.params#

Parameters which are used to compute the transition matrix.

Return type:

Dict[str, Any]

shape#

WOTKernel.shape#

(n_cells, n_cells).

Return type:

Tuple[int, int]

transition_matrix#

WOTKernel.transition_matrix#

Row-normalized transition matrix.

Return type:

Union[ndarray, csr_matrix]

Methods#

compute_initial_growth_rates#

WOTKernel.compute_initial_growth_rates(proliferation_key=None, apoptosis_key=None, organism=None, beta_min=0.3, beta_max=1.7, delta_min=0.3, delta_max=1.7, key_added=None, **kwargs)[source]#

Estimate initial growth rates using a birth-death process as described in [Schiebinger et al., 2019].

The doubling time is defined as \(\frac{\ln 2}{\beta - \delta}\) (similarly defined for half-time). The logistic function is used to transform the birth/death rate scores and to smoothly interpolate between specified minimum and maximum birth/death rates.

Parameters:
  • proliferation_key (Optional[str]) – Key in anndata.AnnData.obs where the birth rate score is saved.

  • apoptosis_key (Optional[str]) – Key in anndata.AnnData.obs where the death rate score is saved.

  • organism (Optional[Literal[‘human’, ‘mouse’]]) – Organism for which to calculate the birth/death scores, if they cannot be found in adata. In this case, scanpy.tl.score_genes() is used to calculate the scores based on an organism-dependent set of marker genes for proliferation and apoptosis.

  • beta_min (float) – Minimum birth rate.

  • beta_max (float) – Maximum birth rate.

  • delta_min (float) – Minimum death rate.

  • delta_max (float) – Maximum death rate.

  • key_added (Optional[str]) – Key in adata .obs where to add the estimated growth rates. If None, just return them.

  • kwargs (Any) – Keyword arguments for scanpy.tl.score_genes(). Only used when proliferation_key or apoptosis_key cannot be found in adata.AnnData.obs.

Return type:

Optional[Series]

Returns:

pandas.Series The estimated initial growth rates if key_added = None, otherwise None.

Notes

If you don’t have access to proliferation/apoptosis gene sets, you can use the ones defined in cellrank for a specific organism. Alternatively, you can also use WOT without an estimate of initial growth rates. In that case, make sure to use several iterations in cellrank.external.kernels.WOTKernel.compute_transition_matrix() by increasing the growth_iters parameter. A value around 3 works well in most cases.

The markers used here were taken from the following sources:

For more information about WOT, see the official tutorial.

compute_transition_matrix#

WOTKernel.compute_transition_matrix(cost_matrices=None, lambda1=1, lambda2=50, epsilon=0.05, growth_iters=1, solver='duality_gap', growth_rate_key=None, use_highly_variable=True, threshold='auto', self_transitions=SelfTransitions.CONNECTIVITIES, conn_weight=None, conn_kwargs=mappingproxy({}), **kwargs)[source]#

Compute transition matrix using Waddington OT [Schiebinger et al., 2019].

Computes transport maps linking together pairs of time points for time-series single cell data using unbalanced optimal transport, taking into account cell birth and death rates. From the sequence of transition maps linking pairs of sequential time points, we construct one large transition matrix which contains the normalized transport maps as blocks on the 1st upper diagonal.

Parameters:
  • cost_matrices (Union[str, Mapping[Tuple[Union[float, int], Union[float, int]], ndarray], None]) – Cost matrices for each consecutive pair of time points. If a str, it specifies a key in anndata.AnnData.layers or anndata.AnnData.obsm. containing cell features that are used to compute cost matrices. If None, use WOT’s default, i.e. compute distances in PCA space derived from anndata.AnnData.X for each time point pair separately.

  • lambda1 (float) – Regularization parameter for the marginal constraint on \(p\), the transport map row sums. Smaller value is useful when precise information about the growth rate is not present.

  • lambda2 (float) – Regularization parameter for the marginal constraint on \(q\), the transport map column sums.

  • epsilon (float) – Entropy regularization parameter. Larger value gives more entropic descendant distributions.

  • growth_iters (int) – Number of iterations for growth rate estimates. If growth rates are not known, consider using more iterations.

  • solver (Literal[‘fixed_iters’, ‘duality_gap’]) – Which solver to use.

  • growth_rate_key (Optional[str]) – Key in anndata.AnnData.obs where initial cell growth rates are stored. See compute_initial_growth_rates() on how to estimate them.

  • use_highly_variable (Union[str, bool, None]) – Key in anndata.AnnData.var where highly variable genes are stored. If True, use ‘highly_variable’. If None, use all genes.

  • threshold (Union[int, float, Literal[‘auto’, ‘auto_local’], None]) –

    How to remove small non-zero values from the transition matrix. Valid options are:

    • ’auto’ - find the maximum threshold value which will not remove every non-zero value from any row.

    • ’auto_local’ - same as above, but done for each transport separately.

    • float - value in [0, 100] corresponding to a percentage of non-zeros to remove in each transport map.

    Rows where all values are removed will have uniform distribution and a warning will be issued.

  • copy – Whether to return a copy of the tmaps or modify in-place.

  • self_transitions (Union[Literal[‘uniform’, ‘diagonal’, ‘connectivities’, ‘all’], Sequence[Union[float, int]]]) –

    How to define transitions within the diagonal blocks that correspond to transitions within the same time point. Valid options are:

    • ’uniform’ - row-normalized matrix of 1s for transitions. Only applied to the last time point.

    • ’diagonal’ - diagonal matrix with 1s on the diagonal. Only applied to the last time point.

    • ’connectivities’ - use transition matrix from cellrank.kernels.ConnectivityKernel. Only applied to the last time point.

    • typing.Sequence - sequence of source time points defining which blocks should be weighted by connectivities. Always applied to the last time point.

    • ’all’ - same as above, but for all time points.

  • conn_weight (Optional[float]) – Weight of connectivities self transitions. Only used when self_transitions = 'all' or a sequence of source time points is passed.

  • conn_kwargs (Mapping[str, Any]) – Keyword arguments for scanpy.pp.neighbors() when using self_transitions use cellrank.kernels.ConnectivityKernel. Can contain ‘density_normalize’ for cellrank.kernels.ConnectivityKernel.compute_transition_matrix().

  • kwargs (Any) – Additional keyword arguments for optimal transport configuration.

Return type:

WOTKernel

Returns:

Self and updates the following attributes:

Also modifies anndata.AnnData.obs with the following key:

  • ’estimated_growth_rates’ - the final estimated growth rates.

Notes

For more information about WOT, see the official tutorial.

copy#

WOTKernel.copy(deep=False)#

Not implemented.

Return type:

ErroredKernel

from_adata#

classmethod WOTKernel.from_adata(adata, key, copy=False)#

Read kernel object saved using write_to_adata().

Parameters:
Return type:

Kernel

Returns:

The kernel with explicitly initialized properties:

plot_projection#

WOTKernel.plot_projection(basis='umap', key_added=None, recompute=False, stream=True, connectivities=None, **kwargs)#

Plot transition_matrix as a stream or a grid plot.

Parameters:
Return type:

None

Returns:

Nothing, just plots and modifies anndata.AnnData.obsm with a key based on key_added.

plot_random_walks#

WOTKernel.plot_random_walks(n_sims=100, max_iter=0.25, seed=None, successive_hits=0, start_ixs=None, stop_ixs=None, basis='umap', cmap='gnuplot', linewidth=1.0, linealpha=0.3, ixs_legend_loc=None, n_jobs=None, backend='loky', show_progress_bar=True, figsize=None, dpi=None, save=None, **kwargs)#

Plot random walks in an embedding.

This method simulates random walks on the Markov chain defined though the corresponding transition matrix. The method is intended to give qualitative rather than quantitative insights into the transition matrix. Random walks are simulated by iteratively choosing the next cell based on the current cell’s transition probabilities.

Parameters:
  • n_sims (int) – Number of random walks to simulate.

  • max_iter (Union[int, float]) – Maximum number of steps of a random walk. If a float, it can be specified as a fraction of the number of cells.

  • seed (Optional[int]) – Random seed.

  • successive_hits (int) – Number of successive hits in the stop_ixs required to stop prematurely.

  • start_ixs (Union[Sequence[str], Mapping[str, Union[str, Sequence[str], Tuple[float, float]]], None]) –

    Cells from which to sample the starting points. If None, use all cells. Can be specified as:

    For example {'dpt_pseudotime': [0, 0.1]} means that starting points for random walks will be sampled uniformly from cells whose pseudotime is in [0, 0.1].

  • stop_ixs (Union[Sequence[str], Mapping[str, Union[str, Sequence[str], Tuple[float, float]]], None]) –

    Cells which when hit, the random walk is terminated. If None, terminate after max_iters. Can be specified as:

    For example {'clusters': ['Alpha', 'Beta']} and successive_hits = 3 means that the random walk will stop prematurely after cells in the above specified clusters have been visited successively 3 times in a row.

  • basis (str) – Basis in anndata.AnnData.obsm to use as an embedding.

  • cmap (Union[str, LinearSegmentedColormap]) – Colormap for the random walk lines.

  • linewidth (float) – Width of the random walk lines.

  • linealpha (float) – Alpha value of the random walk lines.

  • ixs_legend_loc (Optional[str]) – Legend location for the start/top indices.

  • show_progress_bar (bool) – Whether to show a progress bar. Disabling it may slightly improve performance.

  • n_jobs (Optional[int]) – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

  • backend (str) – Which backend to use for parallelization. See joblib.Parallel for valid options.

  • figsize (Optional[Tuple[float, float]]) – Size of the figure.

  • dpi (Optional[int]) – Dots per inch.

  • save (Union[str, Path, None]) – Filename where to save the plot.

  • kwargs (Any) – Keyword arguments for scvelo.pl.scatter().

Return type:

None

Returns:

Nothing, just plots the figure. Optionally saves it based on save. For each random walk, the first/last cell is marked by the start/end colors of cmap.

plot_single_flow#

WOTKernel.plot_single_flow(cluster, cluster_key, time_key, clusters=None, time_points=None, min_flow=0, remove_empty_clusters=True, ascending=False, legend_loc='upper right out', alpha=0.8, xticks_step_size=1, figsize=None, dpi=None, save=None, show=True)#

Visualize outgoing flow from a cluster of cells [Mittnenzweig et al., 2021].

Parameters:
  • cluster (str) – Cluster for which to visualize outgoing flow.

  • cluster_key (str) – Key in anndata.AnnData.obs where clustering is stored.

  • time_key (str) – Key in anndata.AnnData.obs where experimental time is stored.

  • clusters (Optional[Sequence[Any]]) – Visualize flow only for these clusters. If None, use all clusters.

  • time_points (Optional[Sequence[Union[float, int]]]) – Visualize flow only for these time points. If None, use all time points.

  • min_flow (float) – Only show flow edges with flow greater than this value. Flow values are always in [0, 1].

  • remove_empty_clusters (bool) – Whether to remove clusters with no incoming flow edges.

  • ascending (Optional[bool]) – Whether to sort the cluster by ascending or descending incoming flow. If None, use the order as in defined by clusters.

  • alpha (Optional[float]) – Alpha value for cell proportions.

  • xticks_step_size (Optional[int]) – Show only every other n-th tick on the x-axis. If None, don’t show any ticks.

  • legend_loc (Optional[str]) – Position of the legend. If None, do not show the legend.

  • figsize (Optional[Tuple[float, float]]) – Size of the figure.

  • dpi (Optional[int]) – Dots per inch.

  • save (Union[str, Path, None]) – Filename where to save the plot.

  • show (bool) – If False, return matplotlib.pyplot.Axes.

Return type:

Optional[Axes]

Returns:

The axes object, if show = False. Nothing, just plots the figure. Optionally saves it based on save.

Notes

This function is a Python re-implementation of the following original R function with some minor stylistic differences. This function will not recreate the results from [Mittnenzweig et al., 2021], because there, the Metacell model [Baran et al., 2019] was used to compute the flow, whereas here the transition matrix is used.

read#

static WOTKernel.read(fname, adata=None, copy=False)#

De-serialize self from a file.

Parameters:
  • fname (Union[str, Path]) – Filename from which to read the object.

  • adata (Optional[AnnData]) – anndata.AnnData object to assign to the saved object. Only used when the saved object has adata and it was saved without it.

  • copy (bool) – Whether to copy adata before assigning it or not. If adata is a view, it is always copied.

Return type:

IOMixin

Returns:

The de-serialized object.

write#

WOTKernel.write(fname, write_adata=True, ext='pickle')#

Serialize self to a file.

Parameters:
  • fname (Union[str, Path]) – Filename where to save the object.

  • write_adata (bool) – Whether to save adata object or not, if present.

  • ext (Optional[str]) – Filename extension to use. If None, don’t append any extension.

Return type:

None

Returns:

Nothing, just writes itself to a file using pickle.

write_to_adata#

WOTKernel.write_to_adata(key=None, copy=False)#

Write the transition matrix and parameters used for computation to the underlying adata object.

Parameters:

key (Optional[str]) – Key used when writing transition matrix to adata. If None, the key will be determined automatically.

Return type:

None

Returns:

Updates the adata with the following fields:

  • .obsp['{key}'] - the transition matrix.

  • .uns['{key}_params'] - parameters used for the calculation.