harpy.utils.RasterAggregator#

class harpy.utils.RasterAggregator(mask_dask_array, image_dask_array=None, instance_key='cell_ID', instance_size_key='shapeSize', run_on_gpu=True)#

Helper class to calulate aggregated ‘sum’, ‘mean’, ‘var’, ‘kurtosis’, ‘skew’, ‘area’, ‘min’, ‘max’ and ‘center of mass’ of image and labels using Dask.

Parameters:
  • mask_dask_array (Array) – A 3D Dask array of integer labels representing segmented regions. Expected shape is (‘z’, ‘y’, ‘x’). Each unique integer value represents a separate label.

  • image_dask_array (Array | None (default: None)) – A 4D Dask array representing the image data with shape (‘c’, ‘z’, ‘y’, ‘x’), where ‘c’ is the number of channels. Can be None if only mask-based computations (e.g., count or center of mass) are required.

  • instance_key (str (default: 'cell_ID')) – name of the instance key

  • instance_size_key (str (default: 'shapeSize')) – name of the instance size key

  • run_on_gpu (bool (default: True)) – Whether to run on gpu. If no installation of cupy could be detected, will fall back to cpu.

Raises:
  • ValueError – If mask_dask_array does not contain an integer dtype.

  • AssertionError – If mask_dask_array is not 3D.

  • AssertionError – If image_dask_array is provided but is not 4D.

  • AssertionError – If spatial dimensions of image_dask_array and mask_dask_array do not match.

  • AssertionError – If chunk sizes of spatial dimensions do not match between image and mask.

Notes

The aggregate operation computes statistics per chunk. For each chunk in mask_dask_array and image_dask_array, it produces a matrix with shape (i, c, z, y, x), where:

  • i: total number of labels in the global mask

  • c: number of channels in the chunk

  • z = 1, y = 1, x = 1

These matrices (chunks) are then aggregated, to obtain statistics for the global mask and image.

We intentionally avoid the optimization of setting i to the maximum number of labels in any chunk, because this would require an additional pass over the global mask to count labels per chunk (as done in harpy.utils.Featurizer).

Because i typically ranges from thousands to millions, and because only a single feature (e.g., a mean statistic) is computed, each chunk of the aggregated matrix remains under ~50 MB (for a chunksize of c = 1), even when the global mask contains around 10 million labels.

By chunking the underlying dask arrays along the (c, z, y, x) dimensions in the on-disk Zarr store, the user can effectively control RAM usage during aggregation. As a practical guideline, choose chunk sizes of roughly z, y, x ≈ 4096 and c ≈ 5, adjusting these values based on the available memory.

Methods table#

aggregate_area([index])

Computes the area (number of pixels) for each labeled region in the mask.

aggregate_kurtosis([index])

Computes the kurtosis of pixel values within each labeled region for all image channels.

aggregate_max([index])

Computes the maximum pixel value within each labeled region for all image channels.

aggregate_mean([index])

Computes the mean of pixel values within each labeled region for all image channels.

aggregate_min([index])

Computes the minimum pixel value within each labeled region for all image channels.

aggregate_skew([index])

Computes the skewness of pixel values within each labeled region for all image channels.

aggregate_stats([stats_funcs, index])

Computes multiple statistical metrics for each label in the mask, across all image channels.

aggregate_sum([index])

Computes the sum of pixel values within each labeled region for all image channels.

aggregate_var([index])

Computes the variance of pixel values within each labeled region for all image channels.

center_of_mass([index])

Computes the center of mass for each labeled region in the mask.

Methods#

RasterAggregator.aggregate_area(index=None)#

Computes the area (number of pixels) for each labeled region in the mask.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: A DataFrame with one column for area and one for label ID.

RasterAggregator.aggregate_kurtosis(index=None)#

Computes the kurtosis of pixel values within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_max(index=None)#

Computes the maximum pixel value within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_mean(index=None)#

Computes the mean of pixel values within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_min(index=None)#

Computes the minimum pixel value within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_skew(index=None)#

Computes the skewness of pixel values within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_stats(stats_funcs=('sum', 'mean', 'count', 'var', 'kurtosis', 'skew'), index=None)#

Computes multiple statistical metrics for each label in the mask, across all image channels.

Parameters:
  • stats_funcs (tuple[str, ...] (default: ('sum', 'mean', 'count', 'var', 'kurtosis', 'skew'))) – A tuple of statistical functions to apply. Supported values include: “sum”, “mean”, “count”, “var”, “kurtosis”, and “skew”. Defaults to all.

  • index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

list[DataFrame]

Returns:

: A list of DataFrames, each corresponding to one of the requested statistics. Each DataFrame contains one row per label a column per image channel and a column with the label ID, except for stat “count”, which only contains a column with the counts and a column with the label ID.

Example

import harpy as hp

# Load example dataset
sdata = hp.datasets.pixie_example()

image_name = "raw_image_fov0"
labels_name = "label_whole_fov0"

image_array = hp.im.get_dataarray(sdata, element_name=image_name).data
mask_array = hp.im.get_dataarray(sdata, element_name=labels_name).data

# Add dummy z dimension
image_array = image_array[:, None, ...]
mask_array = mask_array[None, ...]

aggregator = hp.utils.RasterAggregator(
    mask_dask_array=mask_array,
    image_dask_array=image_array,
)

df_mean, df_area = aggregator.aggregate_stats(
    stats_funcs=("mean", "count")
)

See also

harpy.tb.allocate_intensity

create an AnnData table from raster data.

RasterAggregator.aggregate_sum(index=None)#

Computes the sum of pixel values within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.aggregate_var(index=None)#

Computes the variance of pixel values within each labeled region for all image channels.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: DataFrame where rows represent labels and columns represent channels.

RasterAggregator.center_of_mass(index=None)#

Computes the center of mass for each labeled region in the mask.

Parameters:

index (ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] | None (default: None)) – Labels to consider. If None all labels will be considered, including background.

Return type:

DataFrame

Returns:

: A DataFrame with columns for spatial coordinates (z,y,x) and label ID.