harpy.im.pixel_clustering_preprocess

harpy.im.pixel_clustering_preprocess#

harpy.im.pixel_clustering_preprocess(sdata, image_name, output_image_name, channels=None, p=99, p_sum=5, p_post=99.9, sigma=2, norm_sum=True, cap_max=None, chunks=None, scale_factors=None, cast_dtype=<class 'numpy.float32'>, persist_intermediate=True, overwrite=False)#

Preprocess image elements specified in image_name. Normalizes and blurs the images based on various percentile and gaussian blur parameters. The results are added to sdata as specified in output_image_name.

Preprocessing function specifically designed for preprocessing images before using harpy.im.flowsom.

Parameters:
  • sdata (SpatialData) – The SpatialData object containing the image data.

  • image_name (str | Iterable[str]) – The image element(s) from sdata to process. This can be a single image element or a list of image elements, e.g., when multiple fields of view are available.

  • output_image_name (str | Iterable[str]) – The preprocessed images are saved under these image element names in sdata.

  • channels (int | str | Iterable[int] | Iterable[str] | None (default: None)) – Specifies the channels to be included in the processing.

  • p (float | None (default: 99)) – Percentile used for normalization. If specified, pixel values are normalized by this percentile across the specified channels. Each channel is normalized by its own calculated percentile.

  • p_sum (float | None (default: 5)) – If the sum of the channel values at a pixel is below this percentile, the pixel values across all channels are set to NaN.

  • p_post (float (default: 99.9)) – Percentile used for normalization after other preprocessing steps (p, p_sum, norm_sum normalization and Gaussian blurring) are performed. If specified, pixel values are normalized by this percentile across the specified channels. Each channel is normalized by its own calculated percentile.

  • sigma (float | Iterable[float] | None (default: 2)) – Gaussian blur parameter for each channel. Use 0 to omit blurring for specific channels or None to skip blurring altogether.

  • norm_sum (bool (default: True)) – If True, each channel is normalized by the sum of all channels at each pixel.

  • cap_max (float | None (default: None)) – The maximum allowable value for the elements in the resulting preprocessed image elements. If None, no capping is applied. Typical value would be 1.0 to exclude outliers.

  • chunks (str | int | tuple[int, ...] | None (default: None)) – Chunk sizes for processing. If provided as a tuple, it should contain chunk sizes for c, (z), y, x.

  • scale_factors (Sequence[dict[str, int] | int] | None (default: None)) – Scale factors to apply for multiscale

  • persist_intermediate (bool (default: True)) – If set to True will persist all preprocessed elements in image_name in memory. If the elements in image_name are large, this could lead to increased ram usage. Set to False to write to intermediate zarr store instead, which will reduce ram usage, but will increase computation time slightly. Persist or writing to intermediate zarr store is needed, otherwise Dask will not be able to optimize the computation graph for the multiple image_name use case. Ignored if sdata is not backed by a zarr store, or if there is only one element in image_name.

  • cast_dtype (type | None (default: <class 'numpy.float32'>)) – Image data in image_name will be casted to dtype before preprocessing starts. If set to None, and input image is of integer type, normalizations will lead to data of type numpy.float64 due to percentile normalizations, leading to increased memory usage.

  • overwrite (bool (default: False)) – If True, overwrites existing data in output_image_name.

Notes

To avoid data leakage:
  • in the single fov case (one image element provided), to prevent data leakage between channels, one should set p_sum=None and norm_sum=False, the only normalization that will be performed will then be a division by the p and p_post percentile values per channel.

  • in the multiple fov case (multiple image elements provided), both p_sum, norm_sum, p and p_post should be set to None to prevent data leakage both between channels and between images.

Return type:

SpatialData

Returns:

: An updated SpatialData object with the preprocessed image data stored in the specified output_image_name.

See also

harpy.im.flowsom

flowsom pixel clustering on image elements.