cellrank.tl.estimators.CFLARE.compute_lineage_drivers

CFLARE.compute_lineage_drivers(lineages=None, method='fischer', cluster_key=None, clusters=None, layer='X', use_raw=False, confidence_level=0.95, n_perms=1000, seed=None, return_drivers=True, **kwargs)

Compute driver genes per lineage.

Correlates gene expression with lineage probabilities, for a given lineage and set of clusters. Often, it makes sense to restrict this to a set of clusters which are relevant for the specified lineages.

Parameters
  • lineages (Union[str, Sequence, None]) – Either a set of lineage names from absorption_probabilities .names or None, in which case all lineages are considered.

  • method (str) –

    Mode to use when calculating p-values and confidence intervals. Can be one of:

    • ’fischer’ - use Fischer transformation [Fischer21].

    • ’perm_test’ - use permutation test.

  • cluster_key (Optional[str]) – Key from adata .obs to obtain cluster annotations. These are considered for clusters.

  • clusters (Union[str, Sequence, None]) – Restrict the correlations to these clusters.

  • layer (str) – Key from adata .layers.

  • use_raw (bool) – Whether or not to use adata .raw to correlate gene expression. If using a layer other than .X, this must be set to False.

  • confidence_level (float) – Confidence level for the confidence interval calculation. Must be in [0, 1].

  • n_perms (int) – Number of permutations to use when method='perm_test'.

  • seed (Optional[int]) – Random seed when method='perm_test'.

  • return_drivers (bool) – Whether to return the drivers. This also contains the lower and upper confidence_level confidence interval bounds.

  • show_progress_bar – Whether to show a progress bar. Disabling it may slightly improve performance.

  • n_jobs – Number of parallel jobs. If -1, use all available cores. If None or 1, the execution is sequential.

  • backend – Which backend to use for parallelization. See joblib.Parallel for valid options.

Return type

Optional[DataFrame]

Returns

  • pandas.DataFrame

    Dataframe of shape (n_genes, n_lineages * 5) containing the following columns, 1 for each lineage:

    • {lineage} corr - correlation between the gene expression and absorption probabilities.

    • {lineage} pval - calulated p-values for double-sided test.

    • {lineage} qval - corrected p-values using Benjamini-Hochberg method at level 0.05.

    • {lineage} ci low - lower bound of the confidence_level correlation confidence interval.

    • {lineage} ci high - upper bound of the confidence_level correlation confidence interval.

    Only if return_drivers=True.

  • None

    Updates adata .var or adata .raw.var, depending use_raw with:

    • '{direction} {lineage} corr' - the potential lineage drivers.

    • '{direction} {lineage} qval' - the corrected p-values.

    Updates the following fields:

References

Fischer21

Fisher, R. A. (1921), On the “probable error” of a coefficient of correlation deduced from a small sample., Metron 1 3–32.