harpy.tb.score_genes

Contents

harpy.tb.score_genes#

harpy.tb.score_genes(sdata, labels_name, table_name, output_table_name, marker_genes, delimiter=',', row_norm=False, repl_columns=None, del_celltypes=None, input_dict=False, key_added='annotation', unknown_celltype_key='unknown_celltype', cleanliness_key='Cleanliness', overwrite=False, **kwargs)#

The function loads marker genes from a CSV file and scores cells for each cell type using those markers using scanpy’s score_genes() function.

Function annotates cells to the celltype with the maximum score obtained through score_genes(). Marker genes can be provided as a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column; or in dictionary format. The function further allows replacements of column names and deletions of specific marker genes.

Parameters:
  • sdata (SpatialData) – The SpatialData object.

  • labels_name (list[str]) – The labels element(s) of sdata used to select the cells via the region key in sdata.tables[table_name].obs. Note that if output_table_name is equal to table_name and overwrite is True, cells in sdata.tables[table_name] linked to other labels_name (via the region key), will be removed from sdata.tables[table_name]. If a list of labels elements is provided, they will therefore be scored together (e.g. multiple samples).

  • table_name (str) – The table element in sdata on which to perform annotation on.

  • output_table_name (str) – The output table element in sdata to which table element with results of annotation will be written.

  • marker_genes (str | Path | DataFrame) – Path to a CSV file, or a DataFrame containing the marker genes. It should be a one-hot encoded matrix with cell types listed in the first row, and marker genes in the first column.

  • delimiter (default: ',') – Delimiter used in the CSV file, if marker_genes is provided as a CSV.

  • row_norm (bool (default: False)) – Flag to determine if row normalization is applied, default is False.

  • repl_columns (dict[str, str] | None (default: None)) – Dictionary containing cell types to be replaced. The keys are the original cell type names and the values are their replacements. This parameter is deprecated, and will be removed in a future version.

  • del_celltypes (list[str] | None (default: None)) – List of cell types to be deleted from the list of possible cell type candidates. Cells are scored for these cell types, but will not be assigned a cell type from this list.

  • input_dict (bool (default: False)) – If True, the marker gene list from the CSV file is treated as a dictionary with the first column being the cell type names and the subsequent columns being the marker genes for those cell types. Default is False. This parameter is deprecated, and will be removed in a future version.

  • key_added (str (default: 'annotation')) – The column name in the .obs attribute of the anndata.AnnData table where the predicted cell type will be saved.

  • unknown_celltype_key (str (default: 'unknown_celltype')) – The name reserved for cells that could not be assigned a specific cell type.

  • cleanliness_key (str (default: 'Cleanliness')) – The column name in the .obs attribute of the anndata.AnnData where we will store a score for the cleanliness of the predicted cell type.

  • overwrite (bool (default: False)) – If True, overwrites the output_table_name if it already exists in sdata.

  • **kwargs (Any) – Additional keyword arguments passed to score_genes().

Return type:

tuple[SpatialData, list[str], list[str]]

Returns:

: tuple:

  • Updated sdata with in sdata.tables[output_table_name].obs an extra column key_added.

  • list of strings, with all celltypes that are scored (but are not in the del_celltypes list).

  • list of strings, with all celltypes, some of which may not be scored, because their corresponding transcripts do not appear in the region of interest. unknown_celltype_key, is also added if it is detected.

Notes

The cell type unknown_celltype_key is reserved for cells that could not be assigned a specific cell type. The deprecated keyword argument celltype_column is still accepted for backward compatibility; use key_added instead.

See also

harpy.tb.score_genes_iter

iterative scoring algorithm.