harpy.tb.cluster_cleanliness

harpy.tb.cluster_cleanliness#

harpy.tb.cluster_cleanliness(sdata, labels_name, table_name, output_table_name, celltypes, celltype_indexes=None, colors=None, celltype_column='annotation', unknown_celltype_key='unknown_celltype', cleanliness_key='Cleanliness', overwrite=False)#

Re-calculates annotations, potentially following corrections to the list of celltypes, or after a manual update of the assigned scores per cell type via e.g. correct_marker_genes.

Celltypes can also be grouped together via the celltype_indexes parameter. Returns a SpatialData object alongside a dictionary mapping cell types to colors.

Deprecated since version 0.3.0: harpy.tb.cluster_cleanliness is deprecated and may be removed in a future release.

Parameters:
  • sdata (SpatialData) – Data containing spatial information.

  • labels_name (list[str]) – The labels element(s) of sdata used to select the cells via the region key in sdata.tables[table_name].obs. Note that if output_table_name is equal to table_name and overwrite is True, cells in sdata.tables[table_name] linked to other labels_name (via the region key), will be removed from sdata.tables[table_name]. If a list of labels elements is provided, they will therefore be scored together (e.g. multiple samples).

  • table_name (str) – The table element in sdata on which to perform cleaning on.

  • output_table_name (str) – The output table element in sdata to which table element with results of cleaned annotations will be written.

  • celltypes (list[str]) – List of celltypes that you want to use for annotation, can be a subset of what is available in the .obs attribute of the corresponding table.

  • celltype_indexes (dict[str, int] | None (default: None)) – Dictionary with cell type as keys and indexes as values. Cell types with provided indexes will be grouped together under new cell type provided as key. E.g.: celltype_indexes = {“fibroblast”: [4,5,23,25], “stellate”: [28,29,30]} -> celltypes at index 4,5,23 and 25 in provided list of celltypes (after an alphabetic sort) will be grouped together as “fibroblast”.

  • colors (list[str] | None (default: None)) – List of colors to be used for visualizing different cell types. If not provided, a default colormap will be generated.

  • celltype_column (str (default: 'annotation')) – The column name in the .obs attribute of the anndata.AnnData table where the predicted cell type is stored (obtained through score_genes() or score_genes_iter()).

  • unknown_celltype_key (str (default: 'unknown_celltype')) – The name reserved for cells that could not be assigned a specific cell type.

  • cleanliness_key (str (default: 'Cleanliness')) – The column name in the .obs attribute of the anndata.AnnData where we will store a score for the cleanliness of the predicted cell type.

  • overwrite (bool (default: False)) – If True, overwrites the output_table_name if it already exists in sdata.

Return type:

tuple[SpatialData, dict | None]

Returns:

: tuple:

  • Updated SpatialData object after the cleanliness analysis.

  • Dictionary with cell types as keys and their corresponding colors as values.

See also

harpy.tb.score_genes

score genes using score_genes().