harpy.tb.extract_instances

harpy.tb.extract_instances#

harpy.tb.extract_instances(sdata, image_name, labels_name, diameter, depth=None, remove_background=True, extract_mask=False, zarr_output_path=None, name_instances_image='image.zarr', name_instances_mask='mask.zarr', batch_size=None, to_coordinate_system='global', run_on_gpu=False, overwrite=False)#

Extract per-label instance windows from image_name/labels_name of size diameter in y and x using dask.array.map_overlap() and dask.array.map_blocks().

For every non-zero label in the labels_name, this method builds a Dask graph that slices out a centered, square window in the y, x plane around that instance (preserving the z dimension) both for the image_name and labels_name.

Note that decreasing the chunk size on disk of the image_name and labels_name elements will lead to decreased consumption of RAM. A good first guess for chunk sizes is: (c_chunksize, y_chunksize, x_chunksize)=(10, 2048, 2048).

For optimal performance, configure dask to use processes, e.g. (dask.config.set(scheduler="processes")).

Parameters:
  • sdata – SpatialData object.

  • image_name (str) – Name of the image element.

  • labels_name (str) – Name of the labels element.

  • diameter (int) – Side length of the resulting y, x window for every instance.

  • depth (int | None (default: None)) – Passed to dask.array.map_overlap(). If not provided depth is set to diameter//2 +1.

  • remove_background (bool (default: True)) – If True, pixels outside the instance label within each window are set to background (e.g., zero) so that only the object remains inside the cutout. If False, the entire window content is kept.

  • extract_mask (bool (default: False)) – If True, the corresponding per instance mask will be extracted.

  • zarr_output_path (str | Path | None (default: None)) – If a filesystem path (string or Path) is provided, the extracted instances are computed and materialized to a Zarr store at that location. The returned object will still be a Dask array pointing at the written data, but all computations necessary to populate the store will have been executed. If None (default), no data are written and the method returns a lazy (not yet computed) Dask array.

  • name_instances_image (str (default: 'image.zarr')) – Name of the Zarr store created under zarr_output_path for extracted image instances. Ignored if zarr_output_path is None or extract_image is False.

  • name_instances_mask (str (default: 'mask.zarr')) – Name of the Zarr store created under zarr_output_path for extracted mask instances. Ignored if zarr_output_path is None or extract_mask is False.

  • batch_size (int | None (default: None)) – Chunksize of the resulting dask array in the i dimension.

  • to_coordinate_system (str (default: 'global')) – The coordinate system that holds image_name and labels_name.

  • run_on_gpu (bool (default: False)) – If True and ‘cupy’ is installed, the extraction step runs on the GPU.

  • overwrite (bool (default: False)) – Whether to overwrite existing Zarr stores at the target locations. If True, any existing Zarr store at zarr_output_path / name_instances_image and/or zarr_output_path / name_instances_mask will be replaced. If False (default), an error is raised if a target store already exists. Ignored if zarr_output_path is None.

Return type:

tuple[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], Array | tuple[Array, Array]]

Returns:

: A 2-tuple (instance_ids, instances) where:

  • instance_ids is a one-dimensional NumPy array of shape (i,) containing the labels of the extracted instances. The value i is the total number of non-zero labels in the input mask. The order of instance_ids is not guaranteed to be sorted.

  • instances contains the extracted instance windows.

    • If extract_mask is False, this is a single Dask array of shape (i, c, z, y, x).

    • If extract_mask is True, this is a 2-tuple (mask_instances, image_instances) where:

      • mask_instances has shape (i, 1, z, y, x) and contains the extracted instance masks.

      • image_instances has shape (i, c, z, y, x) and contains the extracted instance image windows.

    Here, c is the number of image channels, z is the number of planes in the z-dimension, and y and x are equal to diameter.

    The returned Dask arrays are lazy unless zarr_output_path is specified, in which case the data are written to disk and reloaded as Dask arrays backed by Zarr.

Examples

Extract instances directly from a SpatialData object:

import harpy as hp
import matplotlib.pyplot as plt

sdata = hp.datasets.pixie_example()

image_name = "raw_image_fov0"
labels_name = "label_whole_fov0"

# Persist to Zarr on disk (computes instances now)
instance_ids, instances = hp.tb.extract_instances(
    sdata,
    image_name=image_name,
    labels_name=labels_name,
    depth=100,
    diameter=40,
    remove_background=True,
    extract_mask=False,
    zarr_output_path="instances",
    name_instances_image="image_instances.zarr",
    batch_size=64,
    to_coordinate_system="fov0",
)

instance_id = 23
channel_idx = 19

array = instances[instance_ids == instance_id][0][channel_idx][0]
plt.imshow(array)
plt.show()

Or construct a lazy Dask graph:

import harpy as hp
import dask.array as da
import matplotlib.pyplot as plt

sdata = hp.datasets.pixie_example()

image_name = "raw_image_fov0"
labels_name = "label_whole_fov0"

instance_ids, instances_lazy = hp.tb.extract_instances(
    sdata,
    image_name=image_name,
    labels_name=labels_name,
    depth=100,
    diameter=40,
    remove_background=True,
    extract_mask=False,
    zarr_output_path=None,
    batch_size=64,
    to_coordinate_system="fov0",
)

# compute instances now:
instances_lazy.to_zarr( "instances.zarr" )
instances = da.from_zarr( "instances.zarr" )