cellrank.models.SKLearnModel

class cellrank.models.SKLearnModel(adata, model, weight_name=None, ignore_raise=False)[source]

Wrapper around BaseEstimator.

Parameters:
  • adata (AnnData) – Annotated data object.

  • model (BaseEstimator) – Instance of the underlying sklearn estimator, such as SVR.

  • weight_name (Optional[str]) – Name of the weight argument when fitting the model. If None, to determine it automatically. If an empty str, no weights will be used.

  • ignore_raise (bool) – Do not raise an exception if weight argument is not found when fitting the model. This is useful in case when the weight argument is passed in the **kwargs and cannot be determined from signature.

Attributes table

adata

Annotated data object.

conf_int

Array of shape (n_samples, 2) containing the lower and upper bound of the confidence interval.

model

The underlying BaseEstimator.

prepared

Whether the model is prepared for fitting.

shape

Number of cells in adata.

w

Filtered weights of shape (n_filtered_cells,) used for fitting.

w_all

Unfiltered weights of shape (n_cells,).

x

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

x_all

Unfiltered independent variables of shape (n_cells, 1).

x_hat

Filtered independent variables used when calculating default confidence interval, usually same as x.

x_test

Independent variables of shape (n_samples, 1) used for prediction.

y

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

y_all

Unfiltered dependent variables of shape (n_cells, 1).

y_hat

Filtered dependent variables used when calculating default confidence interval, usually same as y.

y_test

Prediction values of shape (n_samples,) for x_test.

Methods table

confidence_interval([x_test])

Calculate the confidence interval.

copy()

Return a copy of self.

default_confidence_interval([x_test])

Calculate the confidence interval, if the underlying model has no method for it.

fit([x, y, w])

Fit the model.

plot([figsize, same_plot, hide_cells, perc, ...])

Plot the smoothed gene expression.

predict([x_test, key_added])

Run the prediction.

prepare(gene, lineage, time_key[, backward, ...])

Prepare the model to be ready for fitting.

read(fname[, adata, copy])

De-serialize self from a file.

write(fname[, write_adata])

Serialize self to a file using pickle.

Attributes

adata

SKLearnModel.adata

Annotated data object.

conf_int

SKLearnModel.conf_int

Array of shape (n_samples, 2) containing the lower and upper bound of the confidence interval.

model

SKLearnModel.model

The underlying BaseEstimator.

prepared

SKLearnModel.prepared

Whether the model is prepared for fitting.

shape

SKLearnModel.shape

Number of cells in adata.

w

SKLearnModel.w

Filtered weights of shape (n_filtered_cells,) used for fitting.

w_all

SKLearnModel.w_all

Unfiltered weights of shape (n_cells,).

x

SKLearnModel.x

Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

x_all

SKLearnModel.x_all

Unfiltered independent variables of shape (n_cells, 1).

x_hat

SKLearnModel.x_hat

Filtered independent variables used when calculating default confidence interval, usually same as x.

x_test

SKLearnModel.x_test

Independent variables of shape (n_samples, 1) used for prediction.

y

SKLearnModel.y

Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

y_all

SKLearnModel.y_all

Unfiltered dependent variables of shape (n_cells, 1).

y_hat

SKLearnModel.y_hat

Filtered dependent variables used when calculating default confidence interval, usually same as y.

y_test

SKLearnModel.y_test

Prediction values of shape (n_samples,) for x_test.

Methods

confidence_interval

SKLearnModel.confidence_interval(x_test=None, **kwargs)[source]

Calculate the confidence interval.

Use default_confidence_interval() function if underlying model has no method for confidence interval calculation.

Parameters:
Return type:

ndarray

Returns:

: Returns self and updates the following fields:

copy

SKLearnModel.copy()[source]

Return a copy of self.

Return type:

SKLearnModel

default_confidence_interval

SKLearnModel.default_confidence_interval(x_test=None, **kwargs)

Calculate the confidence interval, if the underlying model has no method for it.

This formula is taken from [DeSalvo, 1970], eq. 5.

Parameters:
Return type:

ndarray

Returns:

: Returns self and updates the following fields:

Also updates the following fields:

  • x_hat - Filtered independent variables used when calculating default confidence interval, usually same as x.

  • y_hat - Filtered dependent variables used when calculating default confidence interval, usually same as y.

fit

SKLearnModel.fit(x=None, y=None, w=None, **kwargs)[source]

Fit the model.

Parameters:
  • x (Optional[ndarray]) – Independent variables, array of shape (n_samples, 1). If None, use x.

  • y (Optional[ndarray]) – Dependent variables, array of shape (n_samples, 1). If None, use y.

  • w (Optional[ndarray]) – Optional weights of x, array of shape (n_samples,). If None, use w.

  • kwargs (Any) – Keyword arguments for underlying model’s fitting function.

Return type:

SKLearnModel

Returns:

: Fits the model and returns self.

plot

SKLearnModel.plot(figsize=(8, 5), same_plot=False, hide_cells=False, perc=None, fate_prob_cmap=<matplotlib.colors.ListedColormap object>, cell_color=None, lineage_color='black', alpha=0.8, lineage_alpha=0.2, title=None, size=15, lw=2, cbar=True, margins=0.015, xlabel='pseudotime', ylabel='expression', conf_int=True, lineage_probability=False, lineage_probability_conf_int=False, lineage_probability_color=None, obs_legend_loc='best', dpi=None, fig=None, ax=None, return_fig=False, save=None, **kwargs)

Plot the smoothed gene expression.

Parameters:
  • figsize (Tuple[float, float]) – Size of the figure.

  • same_plot (bool) – Whether to plot all trends in the same plot.

  • hide_cells (bool) – Whether to hide the cells.

  • perc (Optional[Tuple[float, float]]) – Percentile by which to clip the fate probabilities.

  • fate_prob_cmap (ListedColormap) – Colormap to use when coloring in the fate probabilities.

  • cell_color (Optional[str]) – Key in obs or var_names used for coloring the cells.

  • lineage_color (str) – Color for the lineage.

  • alpha (float) – Alpha value in \([0, 1]\) for the transparency of cells.

  • lineage_alpha (float) – Alpha value in \([0, 1]\) for the transparency lineage confidence intervals.

  • title (Optional[str]) – Title of the plot.

  • size (int) – Size of the points.

  • lw (float) – Line width for the smoothed values.

  • cbar (bool) – Whether to show the colorbar.

  • margins (float) – Margins around the plot.

  • xlabel (str) – Label on the x-axis.

  • ylabel (str) – Label on the y-axis.

  • conf_int (bool) – Whether to show the confidence interval.

  • lineage_probability (bool) – Whether to show smoothed lineage probability as a dashed line. Note that this will require 1 additional model fit.

  • lineage_probability_conf_int (Union[bool, float]) – Whether to compute and show smoothed lineage probability confidence interval.

  • lineage_probability_color (Optional[str]) – Color to use when plotting the smoothed lineage_probability. If None, it’s the same as lineage_color. Only used when show_lineage_probability = True.

  • obs_legend_loc (Optional[str]) – Location of the legend when cell_color corresponds to a categorical variable.

  • dpi (Optional[int]) – Dots per inch.

  • fig (Optional[Figure]) – Figure to use. If None, create a new one.

  • ax (Optional[Axes]) – Ax to use. If None, create a new one.

  • return_fig (bool) – If True, return the figure object.

  • save (Optional[str]) – Filename where to save the plot. If None, just shows the plots.

  • kwargs (Any) – Keyword arguments for legend().

Return type:

Optional[Figure]

Returns:

: Nothing, just plots the figure. Optionally saves it based on save.

predict

SKLearnModel.predict(x_test=None, key_added='_x_test', **kwargs)[source]

Run the prediction.

Parameters:
  • x_test (Optional[ndarray]) – Array of shape (n_samples,) used for prediction. If None, use x_test.

  • key_added (str) – Attribute name where to save the x_test for later use. If None, don’t save it.

  • kwargs – Keyword arguments for underlying model’s prediction method.

Return type:

ndarray

Returns:

: Returns and updates the following fields:

prepare

SKLearnModel.prepare(gene, lineage, time_key, backward=False, time_range=None, data_key='X', use_raw=False, threshold=None, weight_threshold=(0.01, 0.01), filter_cells=None, n_test_points=200)

Prepare the model to be ready for fitting.

Parameters:
  • gene (str) – Gene in var_names.

  • lineage (Optional[str]) – Name of the lineage. If None, all weights will be set to \(1\).

  • time_key (str) – Key in obs where the pseudotime is stored.

  • backward (bool) – Direction of the process.

  • time_range (Union[float, Tuple[float, float], None]) –

    Specify start and end times:

    • tuple - it specifies the minimum and maximum pseudotime. Both values can be None, in which case the minimum is the earliest pseudotime and the maximum is automatically determined.

    • float - it specifies the maximum pseudotime.

  • data_key (Optional[str]) – Key in layers or 'X' for X. If use_raw = True, it’s always set to 'X'.

  • use_raw (bool) – Whether to access raw.

  • threshold (Optional[float]) – Consider only cells with weights > threshold when estimating the test endpoint. If None, use the median of the weights.

  • weight_threshold (Union[float, Tuple[float, float]]) – Set all weights below weight_threshold to weight_threshold if a float, or to the second value, if a tuple.

  • filter_cells (Optional[float]) – Filter out all cells with expression values lower than this threshold.

  • n_test_points (int) – Number of test points. If None, use the original points based on threshold.

Return type:

BaseModel

Returns:

: Nothing, just updates the following fields:

  • x - Filtered independent variables of shape (n_filtered_cells, 1) used for fitting.

  • y - Filtered dependent variables of shape (n_filtered_cells, 1) used for fitting.

  • w - Filtered weights of shape (n_filtered_cells,) used for fitting.

  • x_all - Unfiltered independent variables of shape (n_cells, 1).

  • y_all - Unfiltered dependent variables of shape (n_cells, 1).

  • w_all - Unfiltered weights of shape (n_cells,).

  • x_test - Independent variables of shape (n_samples, 1) used for prediction.

  • prepared - Whether the model is prepared for fitting.

read

static SKLearnModel.read(fname, adata=None, copy=False)

De-serialize self from a file.

Parameters:
  • fname (Union[str, Path]) – Path from which to read the object.

  • adata (Optional[AnnData]) – AnnData object to assign to the saved object. Only used when the saved object has adata and it was saved without it.

  • copy (bool) – Whether to copy adata before assigning it. If adata is a view, it is always copied.

Return type:

IOMixin

Returns:

: The de-serialized object.

write

SKLearnModel.write(fname, write_adata=True)

Serialize self to a file using pickle.

Parameters:
  • fname (Union[str, Path]) – Path where to save the object.

  • write_adata (bool) – Whether to save adata object.

Return type:

None

Returns:

: Nothing, just writes itself to a file.