harpy.tb.allocate_intensity

harpy.tb.allocate_intensity#

harpy.tb.allocate_intensity(sdata, image_name=None, labels_name=None, output_table_name='table_intensities', channels=None, mode='mean', obs_stats=None, to_coordinate_system='global', chunks=None, append=False, calculate_center_of_mass=True, region_key='fov_labels', instance_key='cell_ID', spatial_key='spatial', instance_size_key='shapeSize', cell_index_name='cells', run_on_gpu=False, overwrite=True)#

Allocates intensity values from a specified image element to corresponding cells in a SpatialData object and returns an updated SpatialData object augmented with a table element (sdata.tables[output_table_name]) AnnData object with intensity values for each cell and each (specified) channel.

It requires that the image element and the labels element have the same shape and alignment.

Internally this function uses harpy.utils.RasterAggregator().

Parameters:
  • sdata (SpatialData) – The SpatialData object containing spatial information about cells.

  • image_name (str | None (default: None)) – The name of the image element in sdata that contains the image data from which to extract intensity information. Both the image_name and labels_name should have the same shape and alignment. If not provided, will use last image_name.

  • labels_name (str | None (default: None)) – The name of the labels element in sdata containing the labels (segmentation) used to define the boundaries of cells. These labels correspond with regions in the image_name. If not provided, will use last labels_name.

  • output_table_name (str, optional) – The table element in sdata in which to save the AnnData object with the intensity values per cell.

  • channels (int | str | Iterable[int] | Iterable[str] | None (default: None)) – Specifies the channels to be considered when extracting intensity information from the image_name. This parameter can take a single integer or string or an iterable of integers or strings representing specific channels. If set to None (the default), intensity data will be aggregated from all available channels within the image element.

  • mode (Literal['sum', 'mean'] (default: 'mean')) – When mode is set to "sum", the total intensity for each label will be added to .X of the resulting output_table_name; if set to "mean", it calculates the average intensity per label.

  • obs_stats (list[str] | None (default: None)) –

    Statistics to add to .obs of output_table_name. Supported values: ["sum", "mean", "count", "var", "kurtosis", "skew", "max", "min"].

    • If obs_stats contains "mode", it will not be added to .obs.

    • For each stat in ["sum", "mean", "var", "kurtosis", "skew", "max", "min"], the result is stored as: {channel_name}_{stat}.

    • "count" is stored in .obs using the name given by instance_size_key.

  • to_coordinate_system (str (default: 'global')) – The coordinate system that holds image_name and labels_name. This should be the intrinsic coordinate system in pixels.

  • chunks (str | int | tuple[int, ...] | None (default: None)) – The chunk size for processing the image data. If provided as a tuple, desired chunksize for (z), y, x should be provided.

  • append (bool (default: False)) – If set to True, and the labels_name does not yet exist as a region_key in sdata.tables[output_table_name].obs, the intensity values extracted during the current function call will be appended (along axis=0) to any existing intensity data within the SpatialData object’s table attribute. If False, and overwrite is set to True any existing data in sdata.tables[output_table_name] will be overwritten by the newly extracted intensity values. Note that we join the AnnData objects using concat() with join="inner".

  • calculate_center_of_mass (bool (default: True)) – If True, the center of mass of the labels in labels_name will be calculated and added to sdata.tables[ output_table_name ].obsm[spatial_key]. The center of mass is computed using scipy.ndimage.center_of_mass. Enabling calculate_center_of_mass will cause the labels_name to be loaded into memory.

  • instance_key (str (default: 'cell_ID')) – Instance key. The name of the column in AnnData table .obs that will hold the instance ids.

  • region_key (str (default: 'fov_labels')) – Region key. The name of the column in AnnData table .obs that will hold the name of the element(s) that are annotated by the resulting table.

  • spatial_key (str (default: 'spatial')) – The key in the AnnData table .obsm that will hold the x and y center of the instances. This center is calculated by calculating the center of mass of each cell in labels_name.

  • instance_size_key (str (default: 'shapeSize')) – The key in the AnnData table .obs that will hold the size of the instances. Ignored if “count” not in obs_stats.

  • cell_index_name (str (default: 'cells')) – The name of the index of the resulting AnnData table.

  • run_on_gpu (bool (default: False)) – Whether to run on gpu. If no installation of cupy could be detected, will fall back to cpu.

  • overwrite (bool (default: True)) – If True, overwrites the output_table_name if it already exists in sdata.

Return type:

SpatialData

Returns:

: An updated version of the input SpatialData object augmented with a table element (sdata.tables[output_table_name]) AnnData object.

Notes

  • The function currently supports scenarios where the image_name and labels_name are aligned and have the same shape. Misalignments or differences in shape must be handled prior to invoking this function.

  • Intensity calculation is performed per channel for each cell. The function aggregates this information and attaches it as a table (AnnData object) within the SpatialData object.

  • Due to the memory-intensive nature of the operation, especially for large datasets, the function implements chunk-based processing, aided by Dask. If sdata is backed by a Zarr store, we recommend using chunks=None and ensuring that the on-disk Dask array chunks are optimized for both storage efficiency and computational performance.

Examples

Allocate intensity statistics into an AnnData table:

import harpy as hp

sdata = hp.datasets.pixie_example()

# Compute intensity statistics in coordinate system "fov0"
sdata = hp.tb.allocate_intensity(
    sdata,
    image_name="raw_image_fov0",
    labels_name="label_whole_fov0",
    to_coordinate_system="fov0",
    output_table_name="my_table",
    mode="sum",
    obs_stats="count",  # cell size
    overwrite=True,
)

# Append intensity statistics in coordinate system "fov1"
sdata = hp.tb.allocate_intensity(
    sdata,
    image_name="raw_image_fov1",
    labels_name="label_whole_fov1",
    to_coordinate_system="fov1",
    output_table_name="my_table",
    mode="sum",
    obs_stats="count",  # cell size
    append=True,
    overwrite=True,
)

See also

harpy.utils.RasterAggregator

out of core calculation of statistics from raster data.