cellrank.models.SKLearnModel¶

class cellrank.models.SKLearnModel(adata, model, weight_name=None, ignore_raise=False)[source]¶

Wrapper around BaseEstimator.

Parameters:

adata (AnnData) – Annotated data object.
model (BaseEstimator) – Instance of the underlying sklearn estimator, such as SVR.
weight_name (str | None) – Name of the weight argument when fitting the model. If None, to determine it automatically. If an empty str, no weights will be used.
ignore_raise (bool) – Do not raise an exception if weight argument is not found when fitting the model. This is useful in case when the weight argument is passed in the **kwargs and cannot be determined from signature.

Attributes table¶

`adata`	Annotated data object.
`conf_int`	Array of shape `(n_samples, 2)` containing the lower and upper bound of the confidence interval.
`model`	The underlying `BaseEstimator`.
`prepared`	Whether the model is prepared for fitting.
`shape`	Number of cells in `adata`.
`signal`	The `Signal` this model was prepared with, or `None`.
`w`	Filtered weights of shape `(n_filtered_cells,)` used for fitting.
`w_all`	Unfiltered weights of shape `(n_cells,)`.
`x`	Filtered independent variables of shape `(n_filtered_cells, 1)` used for fitting.
`x_all`	Unfiltered independent variables of shape `(n_cells, 1)`.
`x_hat`	Filtered independent variables used when calculating default confidence interval, usually same as `x`.
`x_test`	Independent variables of shape `(n_samples, 1)` used for prediction.
`y`	Filtered dependent variables of shape `(n_filtered_cells, 1)` used for fitting.
`y_all`	Unfiltered dependent variables of shape `(n_cells, 1)`.
`y_hat`	Filtered dependent variables used when calculating default confidence interval, usually same as `y`.
`y_test`	Prediction values of shape `(n_samples,)` for `x_test`.

Methods table¶

`confidence_interval`([x_test])	Calculate the confidence interval.
`copy`()	Return a copy of self.
`default_confidence_interval`([x_test])	Calculate the confidence interval, if the underlying `model` has no method for it.
`fit`([x, y, w])	Fit the model.
`plot`([figsize, same_plot, hide_cells, perc, ...])	Plot the smoothed gene expression.
`predict`([x_test, key_added])	Run the prediction.
`prepare`([signal, lineage, time_key, ...])	Prepare the model to be ready for fitting.
`read`(fname[, adata, copy])	De-serialize self from a file.
`write`(fname[, write_adata])	Serialize self to a file using `pickle`.

Attributes¶

adata¶

SKLearnModel.adata¶: Annotated data object.

conf_int¶

SKLearnModel.conf_int¶: Array of shape (n_samples, 2) containing the lower and upper bound of the confidence interval.

model¶

SKLearnModel.model¶: The underlying BaseEstimator.

prepared¶

SKLearnModel.prepared¶: Whether the model is prepared for fitting.

shape¶

SKLearnModel.shape¶: Number of cells in adata.

signal¶

SKLearnModel.signal¶: The Signal this model was prepared with, or None.

w¶

SKLearnModel.w¶: Filtered weights of shape (n_filtered_cells,) used for fitting.

w_all¶

SKLearnModel.w_all¶: Unfiltered weights of shape (n_cells,).

x¶

SKLearnModel.x¶: Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

x_all¶

SKLearnModel.x_all¶: Unfiltered independent variables of shape (n_cells, 1).

x_hat¶

SKLearnModel.x_hat¶: Filtered independent variables used when calculating default confidence interval, usually same as x.

x_test¶

SKLearnModel.x_test¶: Independent variables of shape (n_samples, 1) used for prediction.

y¶

SKLearnModel.y¶: Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

y_all¶

SKLearnModel.y_all¶: Unfiltered dependent variables of shape (n_cells, 1).

y_hat¶

SKLearnModel.y_hat¶: Filtered dependent variables used when calculating default confidence interval, usually same as y.

y_test¶

SKLearnModel.y_test¶: Prediction values of shape (n_samples,) for x_test.

Methods¶

confidence_interval¶

SKLearnModel.confidence_interval(x_test=None, **kwargs)[source]¶

Calculate the confidence interval.

Use default_confidence_interval() function if underlying model has no method for confidence interval calculation.

Parameters:

x_test (ndarray | None) – Array of shape (n_samples,) used for confidence interval calculation. If None, use x_test.
kwargs (Any) – Keyword arguments for underlying model’s confidence method or for default_confidence_interval().

Return type:

ndarray

Returns:

: Returns self and updates the following fields:

copy¶

SKLearnModel.copy()[source]¶

Return a copy of self.

Return type:: SKLearnModel

default_confidence_interval¶

SKLearnModel.default_confidence_interval(x_test=None, **kwargs)¶

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from [DeSalvo, 1970], eq. 5.

Parameters:

x_test (ndarray | None) – Array of shape (n_samples,) used for confidence interval calculation. If None, use x_test.
kwargs (Any) – Keyword arguments for underlying model’s confidence method or for default_confidence_interval().

Return type:

ndarray

Returns:

: Returns self and updates the following fields:

Also updates the following fields:

x_hat - Filtered independent variables used when calculating default confidence interval, usually same as x.
y_hat - Filtered dependent variables used when calculating default confidence interval, usually same as y.

fit¶

SKLearnModel.fit(x=None, y=None, w=None, **kwargs)[source]¶

Fit the model.

Parameters:

x (ndarray | None) – Independent variables, array of shape (n_samples, 1). If None, use x.
y (ndarray | None) – Dependent variables, array of shape (n_samples, 1). If None, use y.
w (ndarray | None) – Optional weights of x, array of shape (n_samples,). If None, use w.
kwargs (Any) – Keyword arguments for underlying model’s fitting function.

Return type:

SKLearnModel

Returns:

: Fits the model and returns self.

plot¶

SKLearnModel.plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, fate_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)¶

Plot the smoothed gene expression.

Parameters:

figsize (tuple[float, float]) – Size of the figure.
same_plot (bool) – Whether to plot all trends in the same plot.
hide_cells (bool) – Whether to hide the cells.
perc (tuple[float, float]) – Percentile by which to clip the fate probabilities.
fate_prob_cmap (ListedColormap) – Colormap to use when coloring in the fate probabilities.
cell_color (str | None) – Key in obs or var_names used for coloring the cells.
lineage_color (str) – Color for the lineage.
alpha (float) – Alpha value in \([0, 1]\) for the transparency of cells.
lineage_alpha (float) – Alpha value in \([0, 1]\) for the transparency lineage confidence intervals.
title (str | None) – Title of the plot.
size (int) – Size of the points.
lw (float) – Line width for the smoothed values.
cbar (bool) – Whether to show the colorbar.
margins (float) – Margins around the plot.
xlabel (str) – Label on the x-axis.
ylabel (str) – Label on the y-axis.
conf_int (bool) – Whether to show the confidence interval.
lineage_probability (bool) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.
lineage_probability_conf_int (bool | float) – Whether to compute and show smoothed lineage probability confidence interval.
lineage_probability_color (str | None) – Color to use when plotting the smoothed lineage_probability. If None, it’s the same as lineage_color. Only used when show_lineage_probability = True.
obs_legend_loc (str | None) – Location of the legend when cell_color corresponds to a categorical variable.
dpi (int) – Dots per inch.
fig (Figure) – Figure to use. If None, create a new one.
ax (Axes) – Ax to use. If None, create a new one.
return_fig (bool) – If True, return the figure object.
save (str | None) – Filename where to save the plot. If None, just shows the plots.
kwargs (Any) – Keyword arguments for legend().

Return type:

Figure | None

Returns:

: Nothing, just plots the figure. Optionally saves it based on save.

predict¶

SKLearnModel.predict(x_test=None, key_added='_x_test', **kwargs)[source]¶

Run the prediction.

Parameters:

x_test (ndarray | None) – Array of shape (n_samples,) used for prediction. If None, use x_test.
key_added (str) – Attribute name where to save the x_test for later use. If None, don’t save it.
kwargs – Keyword arguments for underlying model’s prediction method.

Return type:

ndarray

Returns:

: Returns and updates the following fields:

prepare¶

SKLearnModel.prepare(signal=<object object>, lineage=<object object>, time_key=<object object>, backward=False, time_range=None, data_key=<object object>, use_raw=<object object>, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200, *, gene=<object object>)¶

Prepare the model to be ready for fitting.

Parameters:

signal (str | Signal) –
The observation-aligned quantity to fit along the trajectory. Either a Signal (Gene, Obs or Obsm) or, as a shorthand, a gene name in var_names (equivalent to Gene).

Added in version 2.3.
lineage (str | None) – Name of the lineage. If None, all weights will be set to \(1\).
time_key (str) – Key in obs where the pseudotime is stored.
backward (bool) – Direction of the process.
time_range (float | tuple[float, float] | None) –
Specify start and end times:
- tuple - it specifies the minimum and maximum pseudotime. Both values can be None, in which case the minimum is the earliest pseudotime and the maximum is automatically determined.
- float - it specifies the maximum pseudotime.
data_key (str | None) –

Deprecated since version 2.4: Pass a Signal via signal instead, e.g. Gene(name, layer=...) or Obs(name).
use_raw (bool) –

Deprecated since version 2.4: Pass Gene(name, use_raw=True) via signal instead.
gene (str | Signal) –

Deprecated since version 2.4: Renamed to signal, which also accepts Signal objects.
threshold (float | None) – Consider only cells with weights > threshold when estimating the test endpoint. If None, use the median of the weights.
weight_threshold (float | tuple[float, float]) – Set all weights below weight_threshold to weight_threshold if a float, or to the second value, if a tuple.
filter_cells (float | None) – Filter out all cells with expression values lower than this threshold.
n_test_points (int) – Number of test points. If None, use the original points based on threshold.

Return type:

BaseModel

Returns:

: Nothing, just updates the following fields:

x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.
y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.
w - Filtered weights of shape (n_filtered_cells,) used for fitting.
x_all - Unfiltered independent variables of shape (n_cells, 1).
y_all - Unfiltered dependent variables of shape (n_cells, 1).
w_all - Unfiltered weights of shape (n_cells,).
x_test - Independent variables of shape (n_samples, 1) used for prediction.
prepared - Whether the model is prepared for fitting.

read¶

static SKLearnModel.read(fname, adata=None, copy=False)¶

De-serialize self from a file.

Parameters:

fname (str | Path) – Path from which to read the object.
adata (AnnData | None) – AnnData object to assign to the saved object. Only used when the saved object has adata and it was saved without it.
copy (bool) – Whether to copy adata before assigning it. If adata is a view, it is always copied.

Return type:

IOMixin

Returns:

: The de-serialized object.

write¶

SKLearnModel.write(fname, write_adata=True)¶

Serialize self to a file using pickle.

Parameters:

fname (str | Path) – Path where to save the object.
write_adata (bool) – Whether to save adata object.

Return type:

None

Returns:

: Nothing, just writes itself to a file.