harpy.tb.score_genes_iter

harpy.tb.score_genes_iter#

harpy.tb.score_genes_iter(sdata, labels_name, table_name, output_table_name, marker_genes, delimiter=',', min_score='Zero', min_score_p=25, scaling='Nmarkers', scale_score_p=1, n_iter=5, calculate_umap=False, calculate_neighbors=False, neigbors_kwargs=mappingproxy({}), umap_kwargs=mappingproxy({}), output_dir=None, key_added='annotation', unknown_celltype_key='unknown_celltype', cleanliness_key='Cleanliness', overwrite=False, **kwargs)#

Iterative annotation algorithm.

For each cell, a score is calculated for each cell type.

In the 0-th iteration this is:

First mean expression is substracted from expression levels. Score for each cell type is obtained via sum of these normalized expressions of the markers in the cell.

And in following iterations:

Expression levels are normalized by substracting the mean over all celltypes assigned in iteration i-1. Score for each cell type is obtained via sum of these normalized expressions of the markers in the cell.

Function expects scaled data (obtained through e.g. scale()).

Parameters:
  • sdata (SpatialData) – The SpatialData object.

  • labels_name (list[str]) – The labels element(s) of sdata used to select the cells via the region key in sdata.tables[table_name].obs. Note that if output_table_name is equal to table_name and overwrite is True, cells in sdata.tables[table_name] linked to other labels_name (via the region key), will be removed from sdata.tables[table_name]. If a list of labels elements is provided, they will therefore be scored together (e.g. multiple samples).

  • table_name (str) – The table element in sdata on which to perform annotation on. We assume the data is already preprocessed by e.g. preprocess_transcriptomics(). Features should all have approximately same variance.

  • output_table_name (str) – The output table element in sdata to which table element with results of annotation will be written.

  • marker_genes (str | Path | DataFrame) – Path to the CSV file containing the marker genes or a pandas dataframe. It should be a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column.

  • delimiter (str (default: ',')) – Delimiter used in the CSV file.

  • min_score (Literal['Zero', 'Quantile', None] (default: 'Zero')) – Min score method. Choose from one of these options: “Zero”, “Quantile”, None.

  • min_score_p (float (default: 25)) – Min score percentile. Ignored if min_score is not set to “Quantile”.

  • scaling (Literal['MinMax', 'ZeroMax', 'Nmarkers', 'Robust', 'Rank'] (default: 'Nmarkers')) – Scaling method. Choose from one of these options: “MinMax”, “ZeroMax”, “Nmarkers”, “Robust”, “Rank”.

  • scale_score_p (float (default: 1)) – Scale score percentile.

  • n_iter (int (default: 5)) – Number of iterations.

  • calculate_umap (default: False) – If True, calculates a UMAP via umap() for visualization of obtained annotations per iteration. If False and ‘umap’ or ‘X_umap’ is not in .obsm, then no umap will be plotted.

  • calculate_neighbors (default: False) – If True, calculates neighbors via neighbors(). Ignored if calculate_umap is set to False.

  • umap_kwargs (Mapping[str, Any] (default: mappingproxy({}))) – Keyword arguments passed to umap(). Ignored if calculate_umap is False.

  • neigbors_kwargs (Mapping[str, Any] (default: mappingproxy({}))) – Keyword arguments passed to neighbors(). Ignored if calculate_umap is False or if calculate_neighbors is set to False and “neighbors” already in .uns.keys().

  • output_dir (default: None) – If specified, figures with umaps will be saved in this directory after each iteration. If None, the plots will be displayed directly without saving.

  • key_added (str (default: 'annotation')) – The column name in the .obs attribute of the anndata.AnnData table where the predicted cell type will be saved.

  • unknown_celltype_key (str (default: 'unknown_celltype')) – The name reserved for cells that could not be assigned a specific cell type.

  • cleanliness_key (str (default: 'Cleanliness')) – The column name in the .obs attribute of the anndata.AnnData where we will store a score for the cleanliness of the predicted cell type.

  • overwrite (bool (default: False)) – If True, overwrites the output_table_name if it already exists in sdata.

Return type:

tuple[SpatialData, list[str], list[str]]

Returns:

: tuple:

  • Updated sdata.

  • list of strings, with all celltypes that are scored (but are not in the del_celltypes list).

  • list of strings, with all celltypes, some of which may not be scored, because their corresponding transcripts do not appear in the region of interest. _UNKNOWN_CELLTYPE_KEY, is also added if it is detected.

Notes

The deprecated keyword argument celltype_column is still accepted for backward compatibility; use key_added instead.

See also

harpy.tb.score_genes

score genes using score_genes().