harpy.tb.score_genes#
- harpy.tb.score_genes(sdata, labels_name, table_name, output_table_name, marker_genes, delimiter=',', row_norm=False, repl_columns=None, del_celltypes=None, input_dict=False, key_added='annotation', unknown_celltype_key='unknown_celltype', cleanliness_key='Cleanliness', overwrite=False, **kwargs)#
The function loads marker genes from a CSV file and scores cells for each cell type using those markers using scanpy’s
score_genes()function.Function annotates cells to the celltype with the maximum score obtained through
score_genes(). Marker genes can be provided as a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column; or in dictionary format. The function further allows replacements of column names and deletions of specific marker genes.- Parameters:
sdata (
SpatialData) – The SpatialData object.labels_name (
list[str]) – The labels element(s) ofsdataused to select the cells via the region key insdata.tables[table_name].obs. Note that ifoutput_table_nameis equal totable_nameand overwrite is True, cells insdata.tables[table_name]linked to otherlabels_name(via the region key), will be removed fromsdata.tables[table_name]. If a list of labels elements is provided, they will therefore be scored together (e.g. multiple samples).table_name (
str) – The table element insdataon which to perform annotation on.output_table_name (
str) – The output table element insdatato which table element with results of annotation will be written.marker_genes (
str|Path|DataFrame) – Path to a CSV file, or aDataFramecontaining the marker genes. It should be a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column.delimiter (default:
',') – Delimiter used in the CSV file, ifmarker_genesis provided as a CSV.row_norm (
bool(default:False)) – Flag to determine if row normalization is applied, default is False.repl_columns (
dict[str,str] |None(default:None)) – Dictionary containing cell types to be replaced. The keys are the original cell type names and the values are their replacements. This parameter is deprecated, and will be removed in a future version.del_celltypes (
list[str] |None(default:None)) – List of cell types to be deleted from the list of possible cell type candidates. Cells are scored for these cell types, but will not be assigned a cell type from this list.input_dict (
bool(default:False)) – If True, the marker gene list from the CSV file is treated as a dictionary with the first column being the cell type names and the subsequent columns being the marker genes for those cell types. Default is False. This parameter is deprecated, and will be removed in a future version.key_added (
str(default:'annotation')) – The column name in the.obsattribute of theanndata.AnnDatatable where the predicted cell type will be saved.unknown_celltype_key (
str(default:'unknown_celltype')) – The name reserved for cells that could not be assigned a specific cell type.cleanliness_key (
str(default:'Cleanliness')) – The column name in the.obsattribute of theanndata.AnnDatawhere we will store a score for the cleanliness of the predicted cell type.overwrite (
bool(default:False)) – If True, overwrites theoutput_table_nameif it already exists insdata.**kwargs (
Any) – Additional keyword arguments passed toscore_genes().
- Return type:
tuple[SpatialData,list[str],list[str]]- Returns:
: tuple:
Updated
sdatawith insdata.tables[output_table_name].obsan extra columnkey_added.list of strings, with all celltypes that are scored (but are not in the del_celltypes list).
list of strings, with all celltypes, some of which may not be scored, because their corresponding transcripts do not appear in the region of interest.
unknown_celltype_key, is also added if it is detected.
Notes
The cell type
unknown_celltype_keyis reserved for cells that could not be assigned a specific cell type. The deprecated keyword argumentcelltype_columnis still accepted for backward compatibility; usekey_addedinstead.See also
harpy.tb.score_genes_iteriterative scoring algorithm.