harpy.tb.score_genes_iter#
- harpy.tb.score_genes_iter(sdata, labels_name, table_name, output_table_name, marker_genes, delimiter=',', min_score='Zero', min_score_p=25, scaling='Nmarkers', scale_score_p=1, n_iter=5, calculate_umap=False, calculate_neighbors=False, neigbors_kwargs=mappingproxy({}), umap_kwargs=mappingproxy({}), output_dir=None, key_added='annotation', unknown_celltype_key='unknown_celltype', cleanliness_key='Cleanliness', overwrite=False, **kwargs)#
Iterative annotation algorithm.
For each cell, a score is calculated for each cell type.
In the 0-th iteration this is:
First mean expression is substracted from expression levels. Score for each cell type is obtained via sum of these normalized expressions of the markers in the cell.
And in following iterations:
Expression levels are normalized by substracting the mean over all celltypes assigned in iteration i-1. Score for each cell type is obtained via sum of these normalized expressions of the markers in the cell.
Function expects scaled data (obtained through e.g.
scale()).- Parameters:
sdata (
SpatialData) – TheSpatialDataobject.labels_name (
list[str]) – The labels element(s) ofsdataused to select the cells via the region key insdata.tables[table_name].obs. Note that ifoutput_table_nameis equal totable_nameand overwrite is True, cells insdata.tables[table_name]linked to otherlabels_name(via the region key), will be removed fromsdata.tables[table_name]. If a list of labels elements is provided, they will therefore be scored together (e.g. multiple samples).table_name (
str) – The table element insdataon which to perform annotation on. We assume the data is already preprocessed by e.g.preprocess_transcriptomics(). Features should all have approximately same variance.output_table_name (
str) – The output table element insdatato which table element with results of annotation will be written.marker_genes (
str|Path|DataFrame) – Path to the CSV file containing the marker genes or a pandas dataframe. It should be a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column.delimiter (
str(default:',')) – Delimiter used in the CSV file.min_score (
Literal['Zero','Quantile',None] (default:'Zero')) – Min score method. Choose from one of these options: “Zero”, “Quantile”, None.min_score_p (
float(default:25)) – Min score percentile. Ignored ifmin_scoreis not set to “Quantile”.scaling (
Literal['MinMax','ZeroMax','Nmarkers','Robust','Rank'] (default:'Nmarkers')) – Scaling method. Choose from one of these options: “MinMax”, “ZeroMax”, “Nmarkers”, “Robust”, “Rank”.scale_score_p (
float(default:1)) – Scale score percentile.n_iter (
int(default:5)) – Number of iterations.calculate_umap (default:
False) – IfTrue, calculates a UMAP viaumap()for visualization of obtained annotations per iteration. IfFalseand ‘umap’ or ‘X_umap’ is not in .obsm, then no umap will be plotted.calculate_neighbors (default:
False) – IfTrue, calculates neighbors vianeighbors(). Ignored ifcalculate_umapis set toFalse.umap_kwargs (
Mapping[str,Any] (default:mappingproxy({}))) – Keyword arguments passed toumap(). Ignored ifcalculate_umapisFalse.neigbors_kwargs (
Mapping[str,Any] (default:mappingproxy({}))) – Keyword arguments passed toneighbors(). Ignored ifcalculate_umapisFalseor ifcalculate_neighborsis set toFalseand “neighbors” already in.uns.keys().output_dir (default:
None) – If specified, figures with umaps will be saved in this directory after each iteration. If None, the plots will be displayed directly without saving.key_added (
str(default:'annotation')) – The column name in the.obsattribute of theanndata.AnnDatatable where the predicted cell type will be saved.unknown_celltype_key (
str(default:'unknown_celltype')) – The name reserved for cells that could not be assigned a specific cell type.cleanliness_key (
str(default:'Cleanliness')) – The column name in the.obsattribute of theanndata.AnnDatawhere we will store a score for the cleanliness of the predicted cell type.overwrite (
bool(default:False)) – If True, overwrites theoutput_table_nameif it already exists insdata.
- Return type:
tuple[SpatialData,list[str],list[str]]- Returns:
: tuple:
Updated
sdata.list of strings, with all celltypes that are scored (but are not in the del_celltypes list).
list of strings, with all celltypes, some of which may not be scored, because their corresponding transcripts do not appear in the region of interest. _UNKNOWN_CELLTYPE_KEY, is also added if it is detected.
Notes
The deprecated keyword argument
celltype_columnis still accepted for backward compatibility; usekey_addedinstead.See also
harpy.tb.score_genesscore genes using
score_genes().