harpy.tb.flowsom

Contents

harpy.tb.flowsom#

harpy.tb.flowsom(sdata, cells_labels_name, cluster_labels_name, output_table_name, q=0.999, chunks=None, n_clusters=20, index_names_var=None, index_positions_var=None, random_state=100, region_key='fov_labels', instance_key='cell_ID', cell_index_name='cells', instance_size_key='shapeSize', raw_counts_key='raw_counts', overwrite=False, **kwargs)#

Run FlowSOM cell clustering on pixel-cluster-derived cell features.

Prepare the data obtained from pixel clustering for cell clustering (see cell_clustering_preprocess()) and execute FlowSOM on the resulting table element (output_table_name) of the SpatialData object.

This function applies the FlowSOM clustering algorithm (via flowsom.FlowSOM) on spatial data contained in a SpatialData object. The algorithm organizes data into self-organizing maps and then clusters these maps, grouping them into n_clusters. The results of this clustering are added to a table element in the sdata object.

Typically, one would first process sdata via pixel_clustering_preprocess() and flowsom() before using this function.

Parameters:
  • sdata (SpatialData) – The input SpatialData object.

  • cells_labels_name (str | Iterable[str]) – The labels element(s) in sdata that contain cell segmentation masks. These masks should be previously generated using segment(). If a list of labels elements is provided, they will be clustered together (e.g. multiple samples).

  • cluster_labels_name (str | Iterable[str]) – The labels element(s) in sdata that contain metacluster or SOM cluster masks. These should be obtained via flowsom().

  • output_table_name (str) – The output table element in sdata where results of the clustering and metaclustering will be stored.

  • q (float | None (default: 0.999)) – Quantile used for normalization. If specified, each pixel SOM/meta cluster column in output_table_name is normalized by this quantile prior to FlowSOM clustering. Values are multiplied by 100 after normalization.

  • chunks (str | int | tuple[int, ...] | None (default: None)) – Chunk sizes for processing the data. If provided as a tuple, it should detail chunk sizes for each dimension (z), y, x.

  • n_clusters (int (default: 20)) – The number of metaclusters to form from the self-organizing maps.

  • index_names_var (Iterable[str] | None (default: None)) – Specifies the variable names to be used from sdata.tables[output_table_name].var for clustering. If None, index_positions_var is used if not None.

  • index_positions_var (Iterable[int] | None (default: None)) – Specifies the positions of variables to be used from sdata.tables[output_table_name].var. Used if index_names_var is None.

  • random_state (int (default: 100)) – A random state for reproducibility of the clustering.

  • instance_key (str (default: 'cell_ID')) – Instance key. The name of the column in AnnData table .obs that will hold the instance ids.

  • region_key (str (default: 'fov_labels')) – Region key. The name of the column in AnnData table .obs that will hold the name of the element(s) that are annotated by the resulting table.

  • cell_index_name (str (default: 'cells')) – The name of the index of the resulting AnnData table.

  • instance_size_key (str (default: 'shapeSize')) – The key in the AnnData table .obs that will hold the size of the instances (obtained from cells_labels_name).

  • raw_counts_key (str (default: 'raw_counts')) – Name of the AnnData layer where the non-preprocessed counts will be stored.

  • overwrite (bool (default: False)) – If True, overwrites the existing data in output_table_name if it already exists.

  • **kwargs – Additional keyword arguments passed to flowsom.FlowSOM.

Return type:

tuple[SpatialData, FlowSOM]

Returns:

: tuple:

  • The updated sdata with the clustering results added.

  • An instance of flowsom.FlowSOM containing the trained FlowSOM model.

See also

flowsom()

FlowSOM pixel clustering.

cell_clustering_preprocess()

Prepare data for cell clustering.