harpy.tb.cell_clustering_preprocess

harpy.tb.cell_clustering_preprocess#

harpy.tb.cell_clustering_preprocess(sdata, cells_labels_name, cluster_labels_name, output_table_name, q=0.999, chunks=None, region_key='fov_labels', instance_key='cell_ID', cell_index_name='cells', instance_size_key='shapeSize', raw_counts_key='raw_counts', overwrite=False)#

Preprocesses spatial data for cell clustering.

This function prepares a SpatialData object for cell clustering by integrating cell segmentation masks (obtained via e.g. harpy.im.segment) and SOM pixel/meta cluster (obtained via e.g. harpy.im.flosom). The function calculates the cluster count (clusters provided via cluster_labels_name) for each cell in cells_labels_name, normalized by cell size, and optionally by quantile normalization if q is provided. The results are stored in a specified table element within the sdata object of shape (#cells, #clusters).

Parameters:
  • sdata (SpatialData) – The input SpatialData object containing the spatial proteomics data.

  • cells_labels_name (str | Iterable[str]) – The labels element(s) in sdata that contain cell segmentation masks. These masks should be previously generated using harpy.im.segment.

  • cluster_labels_name (str | Iterable[str]) – The labels element(s) in sdata that contain metacluster or cluster masks. These should be derived from harpy.im.flowsom.

  • output_table_name (str) – The name of the table element within sdata where the preprocessed data will be stored.

  • q (float | None (default: 0.999)) – Quantile used for normalization. If specified, each pixel SOM/meta cluster column in output_table_name is normalized by this quantile. Values are multiplied by 100 after normalization.

  • chunks (str | int | tuple[int, ...] | None (default: None)) – Chunk sizes for processing the data. If provided as a tuple, it should detail chunk sizes for each dimension (z), y, x.

  • instance_key (str (default: 'cell_ID')) – Instance key. The name of the column in AnnData table .obs that will hold the instance ids.

  • region_key (str (default: 'fov_labels')) – Region key. The name of the column in AnnData table .obs that will hold the name of the element(s) that are annotated by the resulting table.

  • cell_index_name (str (default: 'cells')) – The name of the index of the resulting AnnData table.

  • instance_size_key (str (default: 'shapeSize')) – The key in the AnnData table .obs that will hold the size of the instances (obtained from cells_labels_name).

  • raw_counts_key (str (default: 'raw_counts')) – Name of the AnnData layer where the non-preprocessed counts will be stored.

  • overwrite (bool (default: False)) – If True, overwrites the existing data in the specified output_table_name if it already exists.

Return type:

SpatialData

Returns:

: The input sdata with a table element added (output_table_name).

See also

harpy.im.flowsom

flowsom pixel clustering.

harpy.tb.flowsom

flowsom cell clustering.