harpy.utils.RasterAggregator

harpy.utils.RasterAggregator#

class harpy.utils.RasterAggregator(mask_dask_array, image_dask_array=None, instance_key='cell_ID', instance_size_key='shapeSize', run_on_gpu=True)#

Helper class to calulate aggregated ‘sum’, ‘mean’, ‘var’, ‘kurtosis’, ‘skew’, ‘area’, ‘min’, ‘max’ and ‘center of mass’ of image and labels using Dask.

Parameters:

mask_dask_array (Array) – A 3D Dask array of integer labels representing segmented regions. Expected shape is (‘z’, ‘y’, ‘x’). Each unique integer value represents a separate label.
image_dask_array (Array | None (default: None)) – A 4D Dask array representing the image data with shape (‘c’, ‘z’, ‘y’, ‘x’), where ‘c’ is the number of channels. Can be None if only mask-based computations (e.g., count or center of mass) are required.
instance_key (str (default: 'cell_ID')) – name of the instance key
instance_size_key (str (default: 'shapeSize')) – name of the instance size key
run_on_gpu (bool (default: True)) – Whether to run on gpu. If no installation of cupy could be detected, will fall back to cpu.

Raises:

ValueError – If mask_dask_array does not contain an integer dtype.
AssertionError – If mask_dask_array is not 3D.
AssertionError – If image_dask_array is provided but is not 4D.
AssertionError – If spatial dimensions of image_dask_array and mask_dask_array do not match.
AssertionError – If chunk sizes of spatial dimensions do not match between image and mask.

Notes

The aggregate operation computes statistics per chunk. For each chunk in mask_dask_array and image_dask_array, it produces a matrix with shape (i, c, z, y, x), where:

i: total number of labels in the global mask
c: number of channels in the chunk
z = 1, y = 1, x = 1

These matrices (chunks) are then aggregated, to obtain statistics for the global mask and image.

We intentionally avoid the optimization of setting i to the maximum number of labels in any chunk, because this would require an additional pass over the global mask to count labels per chunk (as done in harpy.utils.Featurizer).

Because i typically ranges from thousands to millions, and because only a single feature (e.g., a mean statistic) is computed, each chunk of the aggregated matrix remains under ~50 MB (for a chunksize of c = 1), even when the global mask contains around 10 million labels.

By chunking the underlying dask arrays along the (c, z, y, x) dimensions in the on-disk Zarr store, the user can effectively control RAM usage during aggregation. As a practical guideline, choose chunk sizes of roughly z, y, x ≈ 4096 and c ≈ 5, adjusting these values based on the available memory.

Methods table#

`aggregate_area`([index])	Computes the area (number of pixels) for each labeled region in the mask.
`aggregate_kurtosis`([index])	Computes the kurtosis of pixel values within each labeled region for all image channels.
`aggregate_max`([index])	Computes the maximum pixel value within each labeled region for all image channels.
`aggregate_mean`([index])	Computes the mean of pixel values within each labeled region for all image channels.
`aggregate_min`([index])	Computes the minimum pixel value within each labeled region for all image channels.
`aggregate_skew`([index])	Computes the skewness of pixel values within each labeled region for all image channels.
`aggregate_stats`([stats_funcs, index])	Computes multiple statistical metrics for each label in the mask, across all image channels.
`aggregate_sum`([index])	Computes the sum of pixel values within each labeled region for all image channels.
`aggregate_var`([index])	Computes the variance of pixel values within each labeled region for all image channels.
`center_of_mass`([index])	Computes the center of mass for each labeled region in the mask.

Methods#

RasterAggregator.aggregate_area(index=None)#

Computes the area (number of pixels) for each labeled region in the mask.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: A DataFrame with one column for area and one for label ID.

RasterAggregator.aggregate_kurtosis(index=None)#

Computes the kurtosis of pixel values within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_max(index=None)#

Computes the maximum pixel value within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_mean(index=None)#

Computes the mean of pixel values within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_min(index=None)#

Computes the minimum pixel value within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_skew(index=None)#

Computes the skewness of pixel values within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_stats(stats_funcs=('sum', 'mean', 'count', 'var', 'kurtosis', 'skew'), index=None)#

Computes multiple statistical metrics for each label in the mask, across all image channels.

Parameters:

stats_funcs (tuple[str, ...] (default: ('sum', 'mean', 'count', 'var', 'kurtosis', 'skew'))) – A tuple of statistical functions to apply. Supported values include: “sum”, “mean”, “count”, “var”, “kurtosis”, and “skew”. Defaults to all.
index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

HashableList[DataFrame]

Returns:

: A list of DataFrames, each corresponding to one of the requested statistics. Each DataFrame contains one row per label a column per image channel and a column with the label ID, except for stat “count”, which only contains a column with the counts and a column with the label ID.

Example

import harpy as hp

# Load example dataset
sdata = hp.datasets.pixie_example()

image_name = "raw_image_fov0"
labels_name = "label_whole_fov0"

image_array = hp.im.get_dataarray(sdata, element_name=image_name).data
mask_array = hp.im.get_dataarray(sdata, element_name=labels_name).data

# Add dummy z dimension
image_array = image_array[:, None, ...]
mask_array = mask_array[None, ...]

aggregator = hp.utils.RasterAggregator(
    mask_dask_array=mask_array,
    image_dask_array=image_array,
)

df_mean, df_area = aggregator.aggregate_stats(
    stats_funcs=("mean", "count")
)

harpy.utils.RasterAggregator

Contents

harpy.utils.RasterAggregator#

Methods table#

Methods#