harpy.utils.RasterAggregator#
- class harpy.utils.RasterAggregator(mask_dask_array, image_dask_array=None, instance_key='cell_ID', instance_size_key='shapeSize', run_on_gpu=True)#
Helper class to calulate aggregated ‘sum’, ‘mean’, ‘var’, ‘kurtosis’, ‘skew’, ‘area’, ‘min’, ‘max’ and ‘center of mass’ of image and labels using Dask.
- Parameters:
mask_dask_array (
Array) – A 3D Dask array of integer labels representing segmented regions. Expected shape is (‘z’, ‘y’, ‘x’). Each unique integer value represents a separate label.image_dask_array (
Array|None(default:None)) – A 4D Dask array representing the image data with shape (‘c’, ‘z’, ‘y’, ‘x’), where ‘c’ is the number of channels. Can beNoneif only mask-based computations (e.g., count or center of mass) are required.instance_key (
str(default:'cell_ID')) – name of the instance keyinstance_size_key (
str(default:'shapeSize')) – name of the instance size keyrun_on_gpu (
bool(default:True)) – Whether to run on gpu. If no installation of cupy could be detected, will fall back to cpu.
- Raises:
ValueError – If
mask_dask_arraydoes not contain an integer dtype.AssertionError – If
mask_dask_arrayis not 3D.AssertionError – If
image_dask_arrayis provided but is not 4D.AssertionError – If spatial dimensions of
image_dask_arrayandmask_dask_arraydo not match.AssertionError – If chunk sizes of spatial dimensions do not match between image and mask.
Notes
The aggregate operation computes statistics per chunk. For each chunk in
mask_dask_arrayandimage_dask_array, it produces a matrix with shape (i, c, z, y, x), where:i: total number of labels in the global mask
c: number of channels in the chunk
z = 1, y = 1, x = 1
These matrices (chunks) are then aggregated, to obtain statistics for the global mask and image.
We intentionally avoid the optimization of setting i to the maximum number of labels in any chunk, because this would require an additional pass over the global mask to count labels per chunk (as done in
harpy.utils.Featurizer).Because i typically ranges from thousands to millions, and because only a single feature (e.g., a mean statistic) is computed, each chunk of the aggregated matrix remains under ~50 MB (for a chunksize of c = 1), even when the global mask contains around 10 million labels.
By chunking the underlying dask arrays along the (c, z, y, x) dimensions in the on-disk Zarr store, the user can effectively control RAM usage during aggregation. As a practical guideline, choose chunk sizes of roughly z, y, x ≈ 4096 and c ≈ 5, adjusting these values based on the available memory.
Methods table#
|
Computes the area (number of pixels) for each labeled region in the mask. |
|
Computes the kurtosis of pixel values within each labeled region for all image channels. |
|
Computes the maximum pixel value within each labeled region for all image channels. |
|
Computes the mean of pixel values within each labeled region for all image channels. |
|
Computes the minimum pixel value within each labeled region for all image channels. |
|
Computes the skewness of pixel values within each labeled region for all image channels. |
|
Computes multiple statistical metrics for each label in the mask, across all image channels. |
|
Computes the sum of pixel values within each labeled region for all image channels. |
|
Computes the variance of pixel values within each labeled region for all image channels. |
|
Computes the center of mass for each labeled region in the mask. |
Methods#
- RasterAggregator.aggregate_area(index=None)#
Computes the area (number of pixels) for each labeled region in the mask.
- RasterAggregator.aggregate_kurtosis(index=None)#
Computes the kurtosis of pixel values within each labeled region for all image channels.
- RasterAggregator.aggregate_max(index=None)#
Computes the maximum pixel value within each labeled region for all image channels.
- RasterAggregator.aggregate_mean(index=None)#
Computes the mean of pixel values within each labeled region for all image channels.
- RasterAggregator.aggregate_min(index=None)#
Computes the minimum pixel value within each labeled region for all image channels.
- RasterAggregator.aggregate_skew(index=None)#
Computes the skewness of pixel values within each labeled region for all image channels.
- RasterAggregator.aggregate_stats(stats_funcs=('sum', 'mean', 'count', 'var', 'kurtosis', 'skew'), index=None)#
Computes multiple statistical metrics for each label in the mask, across all image channels.
- Parameters:
stats_funcs (
tuple[str,...] (default:('sum', 'mean', 'count', 'var', 'kurtosis', 'skew'))) – A tuple of statistical functions to apply. Supported values include: “sum”, “mean”, “count”, “var”, “kurtosis”, and “skew”. Defaults to all.index (
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] |None(default:None)) – Labels to consider. If None all labels will be considered, including background.
- Return type:
list[DataFrame]- Returns:
: A list of DataFrames, each corresponding to one of the requested statistics. Each DataFrame contains one row per label a column per image channel and a column with the label ID, except for stat “count”, which only contains a column with the counts and a column with the label ID.
Example
import harpy as hp # Load example dataset sdata = hp.datasets.pixie_example() image_name = "raw_image_fov0" labels_name = "label_whole_fov0" image_array = hp.im.get_dataarray(sdata, element_name=image_name).data mask_array = hp.im.get_dataarray(sdata, element_name=labels_name).data # Add dummy z dimension image_array = image_array[:, None, ...] mask_array = mask_array[None, ...] aggregator = hp.utils.RasterAggregator( mask_dask_array=mask_array, image_dask_array=image_array, ) df_mean, df_area = aggregator.aggregate_stats( stats_funcs=("mean", "count") )
See also
harpy.tb.allocate_intensitycreate an AnnData table from raster data.
- RasterAggregator.aggregate_sum(index=None)#
Computes the sum of pixel values within each labeled region for all image channels.
- RasterAggregator.aggregate_var(index=None)#
Computes the variance of pixel values within each labeled region for all image channels.
- RasterAggregator.center_of_mass(index=None)#
Computes the center of mass for each labeled region in the mask.