Tools

UniCell Deconvolve - Cell Type Deconvolution For Transcriptomic Data.

ucdeconvolve.tl.base(data: Union[AnnData, DataFrame], token: Optional[str] = None, split: bool = True, sort: bool = True, propagate: bool = True, return_results: bool = False, key_added: str = 'ucdbase', use_raw: Union[bool, Tuple[bool, bool]] = True, verbosity: Optional[int] = None) Optional[AnnData]

UniCell Deconvolve: Base

Predicts cell type fractions for provided transcriptomic data.

Parameters:
  • data – Transcriptomic data (obs x genes) to predict cell type fractions. Can be either a dataframe or annotated dataset. Note that in any case data will be converted to an annotated datset object before proceeding.

  • token – UCDeconvolve API access token. If None, defaults to settings parameter.

  • split – Whether or not to split underlying data into three categories, primary, cancer cell_line. Helps with interpretability downstream, default is True.

  • sort – Sort columns of results by mean predictions. Default True.

  • propagate – Whether or not to perform belief propagation and pass predictions up a cell-type heiarchy. helpful in interpreting overall deconvolution results. default is True.

  • return_results – Whether or not to return the predictions dict from the function, default to false as all data is written to anndata object either passed in, or created when passing in a dataframe, which will in that case be returned by default. Also returns the underlying anndata if it is a view as copying can destroy context internally.

  • use_raw – Use counts in ‘adata.raw’. Default True, as by convention in single cell analysis, log1p scaled counts before HVG filter are kept here.

  • verbosity – Level of verbosity for function information. Default is taken from package, set to ‘logging.DEBUG’ for more detailed information.

Returns:

adata_mixture_orig – Results appended to anndata object if return_results or if original input was dataframe.

Return type:

anndata.AnnData

ucdeconvolve.tl.explain(data: AnnData, celltypes: Union[str, List[str], Dict[Union[int, str], str]], groupby: Optional[str] = None, group_n: int = 16, group_frac: Optional[float] = None, token: Optional[str] = None, return_results: bool = False, key_added: str = 'ucdexplain', use_raw: Union[bool, Tuple[bool, bool]] = True, verbosity: Optional[int] = None) Optional[AnnData]

UniCell Deconvolve: Explain

Explains cell type fraction prediction for provided transcriptomic data.

Parameters:
  • data – Transcriptomic data (obs x genes) to predict cell type fractions. Can be either a dataframe or annotated dataset. Note that in any case data will be converted to an annotated datset object before proceeding.

  • celltypes – Name of cell type(s) to get explanations for. If a single string is passed, this celltype is used for all samples. If a list of strings is passed, the list must be the same length as the dataset and each entry corresponds to which celltype to get explanatons for in the whole dataset. If a dictionary is passed, the key should corresponding to an ‘adata.obs’ column defined by ‘groupby’, alliowing for celltype expalantions to be generated specific to different clusters or conditions.

  • groupby – Groupby key in ‘adata.obs’ to arrange search for celltypes. If celltypes is given as a dict, this must be defined.

  • group_n – The number of samples to subsample from each group for explanations, as this is an expensive operation and most cells in a cluster will yield similar results.

  • token – UCDeconvolve API access token. If None, defaults to settings parameter.

  • return_results – Whether or not to return the predictions dict from the function, default to false as all data is written to anndata object either passed in, or created when passing in a dataframe, which will in that case be returned by default. Also returns the underlying anndata if it is a view as copying can destroy context internally.

  • use_raw – Use counts in ‘adata.raw’. Default True, as by convention in single cell analysis, log1p scaled counts before HVG filter are kept here.

  • verbosity – Level of verbosity for function information. Default is taken from package, set to ‘logging.DEBUG’ for more detailed information.

Returns:

adata_mixture_orig – Results appended to anndata object if return_results or if original input was dataframe.

Return type:

anndata.AnnData

ucdeconvolve.tl.select(data: Union[AnnData, DataFrame], reference: Union[AnnData, DataFrame, List[str], str], token: Optional[str] = None, reference_key: str = 'celltype', ignore_categories: Optional[Iterable[str]] = None, method: str = 'both', return_results: bool = False, key_added: str = 'ucdselect', use_raw: Union[bool, Tuple[bool, bool]] = True, verbosity: Optional[int] = None) Optional[AnnData]

UniCell Deconvolve: Select

Predicts cell type fractions for provided transcriptomic data using a user-specified reference. Leverages transfer learning from base UCD model embeddings.

Parameters:
  • data – Transcriptomic data (obs x genes) to predict cell type fractions. Can be either a dataframe or annotated dataset. Note that in any case data will be converted to an annotated datset object before proceeding.

  • reference

    Transcriptomic data (obs x genes) to be used as a reference. Can be either a dataframe or annotated dataset. Note that if a dataframe is passed, row indices should correspond to categories for reference. If a list of strings is passed, these strings should correspond to reference profiles from the unicell cell type registry as any other names will throw an error. If a string alone is passed, we look for a pre-built reference in the ucd backend.

    Currently valid prebuilt references include:

    allen-mouse-cortex : Mouse whole-brain cortex (44 cell types) enge2017-human-pancreas : Human pancreas (6 cell types) lee-human-pbmc-covid : Human PBMC (24 cell types)

  • token – UCDeconvolve API access token. If None, defaults to settings parameter.

  • reference_key – The key in reference.obs or index if reference is a dataframe to use to perform the grouping operation.

  • method – The method used for building a reference matrix. Must be one of two strings, either “embeddings” or “features”. If “embeddings”, the UCD base model is queried to return an embedding vector to represent celltype mixtures, and is used to generated representations for transfer learning. If “features”, model defaults to using features in the reference matrix, similar to other available methods. Reccomended to use “both” in all cases.

  • return_results – Whether or not to return the predictions dict from the function, default to false as all data is written to anndata object either passed in, or created when passing in a dataframe, which will in that case be returned by default. Also returns the underlying anndata if it is a view as copying can destroy context internally.

  • ignore_categories – Categories in ‘reference.obs[‘reference_key’]’ to ignore. Default is None.

  • use_raw – Use counts in ‘adata.raw’. Default True, as by convention in single cell analysis, log1p scaled counts before HVG filter are kept here. Note that if a tuple is passed, it will selectively apply use_raw to DATA and then REF in that order.

  • verbosity – Logging verbosity, if None defaults to settings value.

Returns:

adata_mixture_orig – Results appended to anndata object if return_results or if original input was dataframe.

Return type:

anndata.AnnData