harpy.im.flowsom#
- harpy.im.flowsom(sdata, image_name, output_cluster_labels_name, output_metacluster_labels_name, channels=None, fraction=0.1, n_clusters=5, random_state=100, chunks=None, scale_factors=None, client=None, persist_intermediate=True, write_intermediate=True, instance_key='cell_ID', region_key='fov_labels', spatial_key='spatial', overwrite=False, **kwargs)#
Applies flowsom clustering on image element(s) of a SpatialData object.
This function executes the flowsom clustering algorithm (via
fs.FlowSOM) on spatial data encapsulated by a SpatialData object. The predicted clusters and metaclusters are added as labels elements to respectivelysdata.labels[output_cluster_labels_name]andsdata.labels[output_metacluster_labels_name].- Parameters:
sdata (
SpatialData) – The input SpatialData object.image_name (
str|Iterable[str]) – The image element(s) ofsdataon which FlowSOM is run. It is recommended to preprocess the data withharpy.im.pixel_clustering_preprocess().output_cluster_labels_name (
str|Iterable[str]) – The output labels element insdatato which the predicted FlowSOM SOM clusters are saved.output_metacluster_labels_name (
str|Iterable[str]) – The output labels element insdatato which the predicted FlowSOM metaclusters are saved.channels (
int|str|Iterable[int] |Iterable[str] |None(default:None)) – Specifies the channels to be included in the pixel clustering.fraction (
float|None(default:0.1)) – Fraction of the data to sample for training FlowSOM. Inference will be done on all pixels inimage_name.n_clusters (
int(default:5)) – The number of meta clusters to form.random_state (
int(default:100)) – A random state for reproducibility of the clustering and sampling.chunks (
str|int|tuple[int,...] |None(default:None)) – Chunk sizes used for flowsom inference step onimage_name. If provided as a tuple, it should contain chunk sizes forc,(z),y,x.scale_factors (
Sequence[dict[str,int] |int] |None(default:None)) – Scale factors to apply for multiscaleclient (
Client|None(default:None)) – A DaskClientinstance. If specified, during inference, the trainedfs.models.BaseFlowSOMEstimatormodel will be scattered (client.scatter(...)). This reduces the size of the task graph and can improve performance by minimizing data transfer overhead during computation. If not specified, Dask will use the default scheduler as configured on your system (e.g., single-threaded, multithreaded, or a global client if one is running).persist_intermediate (
bool(default:True)) – If set toTruewill persit intermediate computation in memory. Ifimage_name, or one of the elements inimage_nameis large, this could lead to increased ram usage. Set toFalseto write to intermediate zarr store instead, which will reduce ram usage, but will increase computation time slightly. We advice to setpersist_intermediatetoTrue, as it will only persist an array of dimension(2,z,y,x), of dtypenumpy.uint8. Ignored ifsdatais not backed by a Zarr store.write_intermediate (
bool(default:True)) – If set toTrue, an intermediate Zarr store will be used during sampling fromimage_namefor flowsom training. Enable this option to reduce RAM usage, especially ifimage_nameor any of its components is large. Ignored ifsdatais not backed by a Zarr store.instance_key (
str(default:'cell_ID')) – Instance key. The name of the column in the.obsattribute of theAnnDatatable at slot “cell_data” of themudata.MuDataobject (which is an attribute of the returnedflowsom.FlowSOMobject) that will hold the instance ids.region_key (
str(default:'fov_labels')) – Region key. The name of the column in the.obsattribute of theAnnDatatable at slot “cell_data” of themudata.MuDataobject (which is an attribute of the returnedflowsom.FlowSOMobject) that will hold the name of the image element that is annotated by the table.spatial_key (
str(default:'spatial')) – Spatial key. The name of the slot in the.obsmattribute of theAnnDatatable at slot “cell_data” of themudata.MuDataobject (which is an attribute of the returnedflowsom.FlowSOMobject) that will hold the (z),y,x coordinate of the pixel.overwrite (
bool(default:False)) – IfTrue, overwrites theoutput_cluster_labels_nameand/oroutput_metacluster_labels_nameif it already exists insdata.**kwargs – Additional keyword arguments passed to
fs.FlowSOM.
- Return type:
tuple[SpatialData,FlowSOM,Series]- Returns:
: tuple:
The input
sdatawith the clustering results added.FlowSOM object containing a
MuDataobject and a trainedfs.models.FlowSOMEstimator.MuDataobject will only contain the fraction (via thefractionparameter) of the data sampled from theimage_nameon which the FlowSOM model is trained.A pandas Series object containing a mapping between the clusters and the metaclusters.
See also
harpy.im.pixel_clustering_preprocesspreprocess image elements before applying FlowSOM clustering.
Warning
The function is intended for use with spatial proteomics data. Input data should be appropriately preprocessed (e.g. via
harpy.im.pixel_clustering_preprocess()) to ensure meaningful clustering results.The cluster and metacluster ID’s found in
output_cluster_labels_nameandoutput_metacluster_labels_namecount from 1, while they count from 0 in theFlowSOMobject.