harpy.im.pixel_clustering_preprocess#
- harpy.im.pixel_clustering_preprocess(sdata, image_name, output_image_name, channels=None, p=99, p_sum=5, p_post=99.9, sigma=2, norm_sum=True, cap_max=None, chunks=None, scale_factors=None, cast_dtype=<class 'numpy.float32'>, persist_intermediate=True, overwrite=False)#
Preprocess image elements specified in
image_name. Normalizes and blurs the images based on various percentile and gaussian blur parameters. The results are added tosdataas specified inoutput_image_name.Preprocessing function specifically designed for preprocessing images before using
harpy.im.flowsom.- Parameters:
sdata (
SpatialData) – The SpatialData object containing the image data.image_name (
str|Iterable[str]) – The image element(s) fromsdatato process. This can be a single image element or a list of image elements, e.g., when multiple fields of view are available.output_image_name (
str|Iterable[str]) – The preprocessed images are saved under these image element names insdata.channels (
int|str|Iterable[int] |Iterable[str] |None(default:None)) – Specifies the channels to be included in the processing.p (
float|None(default:99)) – Percentile used for normalization. If specified, pixel values are normalized by this percentile across the specified channels. Each channel is normalized by its own calculated percentile.p_sum (
float|None(default:5)) – If the sum of the channel values at a pixel is below this percentile, the pixel values across all channels are set to NaN.p_post (
float(default:99.9)) – Percentile used for normalization after other preprocessing steps (p,p_sum,norm_sumnormalization and Gaussian blurring) are performed. If specified, pixel values are normalized by this percentile across the specified channels. Each channel is normalized by its own calculated percentile.sigma (
float|Iterable[float] |None(default:2)) – Gaussian blur parameter for each channel. Use0to omit blurring for specific channels orNoneto skip blurring altogether.norm_sum (
bool(default:True)) – IfTrue, each channel is normalized by the sum of all channels at each pixel.cap_max (
float|None(default:None)) – The maximum allowable value for the elements in the resulting preprocessed image elements. IfNone, no capping is applied. Typical value would be1.0to exclude outliers.chunks (
str|int|tuple[int,...] |None(default:None)) – Chunk sizes for processing. If provided as a tuple, it should contain chunk sizes forc,(z),y,x.scale_factors (
Sequence[dict[str,int] |int] |None(default:None)) – Scale factors to apply for multiscalepersist_intermediate (
bool(default:True)) – If set toTruewill persist all preprocessed elements inimage_namein memory. If the elements inimage_nameare large, this could lead to increased ram usage. Set toFalseto write to intermediate zarr store instead, which will reduce ram usage, but will increase computation time slightly. Persist or writing to intermediate zarr store is needed, otherwise Dask will not be able to optimize the computation graph for the multipleimage_nameuse case. Ignored ifsdatais not backed by a zarr store, or if there is only one element inimage_name.cast_dtype (
type|None(default:<class 'numpy.float32'>)) – Image data inimage_namewill be casted todtypebefore preprocessing starts. If set to None, and input image is of integer type, normalizations will lead to data of typenumpy.float64due to percentile normalizations, leading to increased memory usage.overwrite (
bool(default:False)) – IfTrue, overwrites existing data inoutput_image_name.
Notes
- To avoid data leakage:
in the single fov case (one image element provided), to prevent data leakage between channels, one should set
p_sum=Noneandnorm_sum=False, the only normalization that will be performed will then be a division by thepandp_postpercentile values per channel.in the multiple fov case (multiple image elements provided), both
p_sum,norm_sum,pandp_postshould be set to None to prevent data leakage both between channels and between images.
- Return type:
- Returns:
: An updated SpatialData object with the preprocessed image data stored in the specified
output_image_name.
See also
harpy.im.flowsomflowsom pixel clustering on image elements.