cellrank.kernels.CytoTRACEKernel¶
- class cellrank.kernels.CytoTRACEKernel(adata, backward=False, **kwargs)[source]¶
Kernel which computes directed transition probabilities using the CytoTRACE score [Gulati et al., 2020].
See also
See CellRank Meets CytoTRACE on how to compute the
transition_matrix
based on the CytoTRACE score.
The k-NN graph contains information about the (undirected) connectivities among cells, reflecting their similarity. CytoTRACE can be used to estimate cellular plasticity and in turn, a pseudotemporal ordering of cells from more plastic to less plastic states. It relies on the assumption that differentiated cells express, on average, fewer genes than naive cells.
This kernel internally uses the
PseudotimeKernel
to direct the k-NN graph on the basis of the CytoTRACE pseudotime.Optionally, we apply a density correction as described in [Coifman et al., 2005], where we use the implementation of [Haghverdi et al., 2016].
- Parameters:
adata (
AnnData
) – Annotated data object.backward (
bool
) – Direction of the process.kwargs (
Any
) – Keyword arguments for thePseudotimeKernel
.
Examples
import scvelo as scv import cellrank as cr adata = cr.datasets.pancreas() sc.pp.filter_genes(adata, min_cells=10) sc.pp.normalize_total(adata) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata) # CytoTRACE by default uses imputed data - a simple way to compute # k-NN imputed data is to use scVelo's moments function. # However, note that this function expects `spliced` counts because # it's designed for RNA velocity, so we're using a simple hack here: if 'spliced' not in adata.layers or 'unspliced' not in adata.layers: adata.layers['spliced'] = adata.X adata.layers['unspliced'] = adata.X scv.pp.moments(adata) ctk = cr.kernels.CytoTRACEKernel(adata) ckt = ctk.compute_cytotrace().compute_transition_matrix()
Attributes table¶
Annotated data object. |
|
Direction of the process. |
|
Underlying connectivity matrix. |
|
Underlying base kernels. |
|
Parameters which are used to compute the transition matrix. |
|
Pseudotemporal ordering of cells. |
|
|
|
Row-normalized transition matrix. |
Methods table¶
|
Compute cross-boundary correctness score between source and target cluster. |
|
Re-implementation of the CytoTRACE algorithm [Gulati et al., 2020] to estimate cellular plasticity. |
|
Compute transition matrix based on k-NN graph and pseudotemporal ordering. |
|
Return a copy of self. |
|
Read the kernel saved using |
|
Plot |
|
Plot random walks in an embedding. |
|
Visualize outgoing flow from a cluster of cells [Mittnenzweig et al., 2021]. |
|
De-serialize self from a file. |
|
Serialize self to a file using |
|
Write the transition matrix and parameters used for computation to the underlying |
Attributes¶
adata¶
- CytoTRACEKernel.adata¶
Annotated data object.
backward¶
- CytoTRACEKernel.backward¶
Direction of the process.
connectivities¶
- CytoTRACEKernel.connectivities¶
Underlying connectivity matrix.
kernels¶
- CytoTRACEKernel.kernels¶
Underlying base kernels.
params¶
- CytoTRACEKernel.params¶
Parameters which are used to compute the transition matrix.
pseudotime¶
- CytoTRACEKernel.pseudotime¶
Pseudotemporal ordering of cells.
shape¶
- CytoTRACEKernel.shape¶
(n_cells, n_cells)
.
transition_matrix¶
- CytoTRACEKernel.transition_matrix¶
Row-normalized transition matrix.
Methods¶
cbc¶
- CytoTRACEKernel.cbc(source, target, cluster_key, rep, graph_key='distances')¶
Compute cross-boundary correctness score between source and target cluster.
- Parameters:
- Return type:
- Returns:
: Cross-boundary correctness score for each observation.
compute_cytotrace¶
- CytoTRACEKernel.compute_cytotrace(layer='Ms', aggregation='mean', use_raw=False, n_genes=200)[source]¶
Re-implementation of the CytoTRACE algorithm [Gulati et al., 2020] to estimate cellular plasticity.
Computes the number of genes expressed per cell and ranks genes according to their correlation with this measure. Next, it selects to top-correlating genes and aggregates their (imputed) expression to obtain the CytoTRACE score. A high score stands for high differentiation potential (naive, plastic cells) and a low score stands for low differentiation potential (mature, differentiation cells).
- Parameters:
layer (
Optional
[str
]) – Key inlayers
or'X'
forX
from where to get the expression.aggregation (
Literal
['mean'
,'median'
,'hmean'
,'gmean'
]) –How to aggregate expression of the top-correlating genes. Valid options are:
'mean'
- arithmetic mean.'median'
- median.'hmean'
- harmonic mean.'gmean'
- geometric mean.
use_raw (
bool
) – Whether to use theraw
to compute the number of genes expressed per cell (#genes/cell) and the correlation of gene expression across cells with #genes/cell.n_genes (
int
) – Number of top positively correlated genes to compute the CytoTRACE score.
- Return type:
- Returns:
: Nothing, just modifies
obs
with the following keys:'ct_score'
- the normalized CytoTRACE score.'ct_pseudotime'
- associated pseudotime, essentially1 - CytoTRACE score
.'ct_num_exp_genes'
- the number of genes expressed per cell, basis of the CytoTRACE score.
It also modifies
var
with the following keys:'ct_gene_corr'
- the correlation as specified above.'ct_correlates'
- indication of the genes used to compute the CytoTRACE score, i.e. the ones that correlated positively with'ct_num_exp_genes'
.
Notes
This will not exactly reproduce the results of the original CytoTRACE algorithm [Gulati et al., 2020] because we allow for any normalization and imputation techniques whereas CytoTRACE has built-in specific methods for that.
compute_transition_matrix¶
- CytoTRACEKernel.compute_transition_matrix(threshold_scheme='hard', frac_to_keep=0.3, b=10.0, nu=0.5, check_irreducibility=False, n_jobs=None, backend='loky', show_progress_bar=True, **kwargs)¶
Compute transition matrix based on k-NN graph and pseudotemporal ordering.
Depending on the choice of the
threshold_scheme
, it is based on ideas by either Palantir [Setty et al., 2019] or VIA [Stassen et al., 2021].- Parameters:
threshold_scheme (
Union
[Literal
['soft'
,'hard'
],Callable
[[float
,ndarray
,ndarray
],ndarray
]]) –Which method to use when biasing the graph. Valid options are:
'hard'
- based on Palantir [Setty et al., 2019] which removes some edges that point against the direction of increasing pseudotime. To avoid disconnecting the graph, it does not remove all edges that point against the direction of increasing pseudotime, but keeps the ones that point to cells inside a close radius. This radius is chosen according to the local cell density.'soft'
- based on VIA [Stassen et al., 2021] which down-weights edges that points against the direction of increasing pseudotime. Essentially, the further “behind” a query cell is in pseudotime with respect to the current reference cell, the more penalized will be its graph-connectivity.callable
- any function conforming to the signature ofcellrank.kernels.utils.ThresholdSchemeABC.__call__()
.
frac_to_keep (
float
) – Fraction of the closest neighbors (according to graph connectivities) are kept, no matter whether they lie in the pseudotemporal past or future. This is done to ensure that the graph remains connected. Only used whenthreshold_scheme = 'hard'
. Must be in \([0, 1]\).b (
float
) – The growth rate of generalized logistic function. Only used whenthreshold_scheme = 'soft'
.nu (
float
) – Affects near which asymptote maximum growth occurs. Only used whenthreshold_scheme = 'soft'
.check_irreducibility (
bool
) – Optional check for irreducibility of the final transition matrix.show_progress_bar (
bool
) – Whether to show a progress bar. Disabling it may slightly improve performance.n_jobs (
Optional
[int
]) – Number of parallel jobs. If -1, use all available cores. IfNone
or 1, the execution is sequential.backend (
Literal
['loky'
,'multiprocessing'
,'threading'
]) – Which backend to use for parallelization. SeeParallel
for valid options.kwargs (
Any
) – Keyword arguments forthreshold_scheme
.
- Return type:
- Returns:
: Returns self and updates
transition_matrix
andparams
.
copy¶
- CytoTRACEKernel.copy(*, deep=False)¶
Return a copy of self.
- Parameters:
deep (
bool
) – Whether to usedeepcopy()
.- Return type:
- Returns:
: Copy of self.
from_adata¶
- classmethod CytoTRACEKernel.from_adata(adata, key, copy=False)¶
Read the kernel saved using
write_to_adata()
.- Parameters:
adata (
AnnData
) – Annotated data object.key (
str
) – Key inobsp
where the transition matrix is stored. The parameters should be stored inadata.uns['{key}_params']
.copy (
bool
) – Whether to copy the transition matrix.
- Return type:
- Returns:
: The kernel with explicitly initialized properties:
transition_matrix
- the transition matrix.params
- parameters used for computation.
plot_projection¶
- CytoTRACEKernel.plot_projection(basis='umap', key_added=None, recompute=False, stream=True, connectivities=None, **kwargs)¶
Plot
transition_matrix
as a stream or a grid plot.- Parameters:
key_added (
Optional
[str
]) – If notNone
, save the result toadata.obsm['{key_added}']
. Otherwise, save the result to'T_fwd_{basis}'
or'T_bwd_{basis}'
, depending on the direction.recompute (
bool
) – Whether to recompute the projection if it already exists.stream (
bool
) – IfTrue
, usevelocity_embedding_stream()
. Otherwise, usevelocity_embedding_grid()
.connectivities (
Optional
[spmatrix
]) – Connectivity matrix to use for projection. IfNone
, use ones from the underlying kernel, is possible.kwargs (
Any
) – Keyword argument for the above-mentioned plotting function.
- Return type:
- Returns:
: Nothing, just plots and modifies
obsm
with a key based on thekey_added
.
plot_random_walks¶
- CytoTRACEKernel.plot_random_walks(n_sims=100, max_iter=0.25, seed=None, successive_hits=0, start_ixs=None, stop_ixs=None, basis='umap', cmap='gnuplot', linewidth=1.0, linealpha=0.3, ixs_legend_loc=None, n_jobs=None, backend='loky', show_progress_bar=True, figsize=None, dpi=None, save=None, **kwargs)¶
Plot random walks in an embedding.
This method simulates random walks on the Markov chain defined though the corresponding transition matrix. The method is intended to give qualitative rather than quantitative insights into the transition matrix. Random walks are simulated by iteratively choosing the next cell based on the current cell’s transition probabilities.
- Parameters:
n_sims (
int
) – Number of random walks to simulate.max_iter (
Union
[int
,float
]) – Maximum number of steps of a random walk. If afloat
, it can be specified as a fraction of the number of cells.successive_hits (
int
) – Number of successive hits in thestop_ixs
required to stop prematurely.start_ixs (
Union
[Sequence
[str
],Mapping
[str
,Union
[str
,Sequence
[str
],tuple
[float
,float
]]],None
]) –Cells from which to sample the starting points. If
None
, use all cells. Can be specified as:dict
- dictionary with 1 key inobs
with values corresponding to either 1 or more clusters (if the column is categorical) or atuple
specifying \([min, max]\) interval from which to select the indices.
For example
{'dpt_pseudotime': [0, 0.1]}
means that starting points for random walks will be sampled uniformly from cells whose pseudotime is in \([0, 0.1]\).stop_ixs (
Union
[Sequence
[str
],Mapping
[str
,Union
[str
,Sequence
[str
],tuple
[float
,float
]]],None
]) –Cells which when hit, the random walk is terminated. If
None
, terminate aftermax_iters
. Can be specified as:dict
- dictionary with 1 key inobs
with values corresponding to either 1 or more clusters (if the column is categorical) or atuple
specifying \([min, max]\) interval from which to select the indices.
For example
{'clusters': ['Alpha', 'Beta']}
andsuccessive_hits = 3
means that the random walk will stop prematurely after cells in the above specified clusters have been visited successively 3 times in a row.cmap (
Union
[str
,LinearSegmentedColormap
]) – Colormap for the random walk lines.linewidth (
float
) – Width of the random walk lines.linealpha (
float
) – Alpha value of the random walk lines.ixs_legend_loc (
Optional
[str
]) – Legend location for the start/top indices.show_progress_bar (
bool
) – Whether to show a progress bar. Disabling it may slightly improve performance.n_jobs (
Optional
[int
]) – Number of parallel jobs. If -1, use all available cores. IfNone
or 1, the execution is sequential.backend (
str
) – Which backend to use for parallelization. SeeParallel
for valid options.figsize (
Optional
[tuple
[float
,float
]]) – Size of the figure.save (
Union
[Path
,str
,None
]) – Filename where to save the plot.
- Return type:
- Returns:
: Nothing, just plots the figure. Optionally saves it based on
save
. For each random walk, the first/last cell is marked by the start/end colors ofcmap
.
plot_single_flow¶
- CytoTRACEKernel.plot_single_flow(cluster, cluster_key, time_key, clusters=None, time_points=None, min_flow=0, remove_empty_clusters=True, ascending=False, legend_loc='upper right out', alpha=0.8, xticks_step_size=1, figsize=None, dpi=None, save=None, show=True)¶
Visualize outgoing flow from a cluster of cells [Mittnenzweig et al., 2021].
- Parameters:
cluster (
str
) – Cluster for which to visualize outgoing flow.time_key (
str
) – Key inobs
where experimental time is stored.clusters (
Optional
[Sequence
[Any
]]) – Visualize flow only for these clusters. IfNone
, use all clusters.time_points (
Optional
[Sequence
[Union
[float
,int
]]]) – Visualize flow only for these time points. IfNone
, use all time points.min_flow (
float
) – Only show flow edges with flow greater than this value. Flow values are always in \([0, 1]\).remove_empty_clusters (
bool
) – Whether to remove clusters with no incoming flow edges.ascending (
Optional
[bool
]) – Whether to sort the cluster by ascending or descending incoming flow. If None, use the order as in defined byclusters
.xticks_step_size (
Optional
[int
]) – Show only every other n-th tick on the x-axis. IfNone
, don’t show any ticks.legend_loc (
Optional
[str
]) – Position of the legend. IfNone
, do not show the legend.figsize (
Optional
[tuple
[float
,float
]]) – Size of the figure.figsize – Size of the figure.
dpi – Dots per inch.
save (
Union
[Path
,str
,None
]) – Filename where to save the plot.
- Return type:
- Returns:
: The axes object, if
show = False
. Nothing, just plots the figure. Optionally saves it based onsave
.
Notes
This function is a Python re-implementation of the following original R function with some minor stylistic differences. This function will not recreate the results from [Mittnenzweig et al., 2021], because there, the Metacell model [Baran et al., 2019] was used to compute the flow, whereas here the transition matrix is used.
read¶
- static CytoTRACEKernel.read(fname, adata=None, copy=False)¶
De-serialize self from a file.
- Parameters:
fname (
Union
[str
,Path
]) – Path from which to read the object.adata (
Optional
[AnnData
]) –AnnData
object to assign to the saved object. Only used when the saved object hasadata
and it was saved without it.copy (
bool
) – Whether to copyadata
before assigning it. Ifadata
is a view, it is always copied.
- Return type:
IOMixin
- Returns:
: The de-serialized object.
write¶
write_to_adata¶
- CytoTRACEKernel.write_to_adata(key=None, copy=False)¶
Write the transition matrix and parameters used for computation to the underlying
adata
object.- Parameters:
- Return type:
- Returns:
: Updates the
adata
with the following fields:obsp['{key}']
- the transition matrix.uns['{key}_params']
- parameters used for the calculation.