harpy.tb.extract_instances#
- harpy.tb.extract_instances(sdata, image_name, labels_name, diameter, depth=None, remove_background=True, extract_mask=False, zarr_output_path=None, name_instances_image='image.zarr', name_instances_mask='mask.zarr', batch_size=None, to_coordinate_system='global', run_on_gpu=False, overwrite=False)#
Extract per-label instance windows from
image_name/labels_nameof sizediameterinyandxusingdask.array.map_overlap()anddask.array.map_blocks().For every non-zero label in the
labels_name, this method builds a Dask graph that slices out a centered, square window in they,xplane around that instance (preserving thezdimension) both for theimage_nameandlabels_name.Note that decreasing the chunk size on disk of the
image_nameandlabels_nameelements will lead to decreased consumption of RAM. A good first guess for chunk sizes is:(c_chunksize, y_chunksize, x_chunksize)=(10, 2048, 2048).For optimal performance, configure
daskto useprocesses, e.g. (dask.config.set(scheduler="processes")).- Parameters:
sdata – SpatialData object.
image_name (
str) – Name of the image element.labels_name (
str) – Name of the labels element.diameter (
int) – Side length of the resultingy,xwindow for every instance.depth (
int|None(default:None)) – Passed todask.array.map_overlap(). If not provideddepthis set todiameter//2 +1.remove_background (
bool(default:True)) – IfTrue, pixels outside the instance label within each window are set to background (e.g., zero) so that only the object remains inside the cutout. IfFalse, the entire window content is kept.extract_mask (
bool(default:False)) – IfTrue, the corresponding per instance mask will be extracted.zarr_output_path (
str|Path|None(default:None)) – If a filesystem path (string orPath) is provided, the extracted instances are computed and materialized to a Zarr store at that location. The returned object will still be a Dask array pointing at the written data, but all computations necessary to populate the store will have been executed. IfNone(default), no data are written and the method returns a lazy (not yet computed) Dask array.name_instances_image (
str(default:'image.zarr')) – Name of the Zarr store created underzarr_output_pathfor extracted image instances. Ignored ifzarr_output_pathisNoneorextract_imageisFalse.name_instances_mask (
str(default:'mask.zarr')) – Name of the Zarr store created underzarr_output_pathfor extracted mask instances. Ignored ifzarr_output_pathisNoneorextract_maskisFalse.batch_size (
int|None(default:None)) – Chunksize of the resulting dask array in theidimension.to_coordinate_system (
str(default:'global')) – The coordinate system that holdsimage_nameandlabels_name.run_on_gpu (
bool(default:False)) – If True and ‘cupy’ is installed, the extraction step runs on the GPU.overwrite (
bool(default:False)) – Whether to overwrite existing Zarr stores at the target locations. IfTrue, any existing Zarr store atzarr_output_path / name_instances_imageand/orzarr_output_path / name_instances_maskwill be replaced. IfFalse(default), an error is raised if a target store already exists. Ignored ifzarr_output_pathisNone.
- Return type:
tuple[ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]],Array|tuple[Array,Array]]- Returns:
: A 2-tuple
(instance_ids, instances)where:instance_idsis a one-dimensional NumPy array of shape(i,)containing the labels of the extracted instances. The valueiis the total number of non-zero labels in the input mask. The order ofinstance_idsis not guaranteed to be sorted.instancescontains the extracted instance windows.If
extract_maskisFalse, this is a single Dask array of shape(i, c, z, y, x).If
extract_maskisTrue, this is a 2-tuple(mask_instances, image_instances)where:mask_instanceshas shape(i, 1, z, y, x)and contains the extracted instance masks.image_instanceshas shape(i, c, z, y, x)and contains the extracted instance image windows.
Here,
cis the number of image channels,zis the number of planes in the z-dimension, andyandxare equal todiameter.The returned Dask arrays are lazy unless
zarr_output_pathis specified, in which case the data are written to disk and reloaded as Dask arrays backed by Zarr.
Examples
Extract instances directly from a SpatialData object:
import harpy as hp import matplotlib.pyplot as plt sdata = hp.datasets.pixie_example() image_name = "raw_image_fov0" labels_name = "label_whole_fov0" # Persist to Zarr on disk (computes instances now) instance_ids, instances = hp.tb.extract_instances( sdata, image_name=image_name, labels_name=labels_name, depth=100, diameter=40, remove_background=True, extract_mask=False, zarr_output_path="instances", name_instances_image="image_instances.zarr", batch_size=64, to_coordinate_system="fov0", ) instance_id = 23 channel_idx = 19 array = instances[instance_ids == instance_id][0][channel_idx][0] plt.imshow(array) plt.show()
Or construct a lazy Dask graph:
import harpy as hp import dask.array as da import matplotlib.pyplot as plt sdata = hp.datasets.pixie_example() image_name = "raw_image_fov0" labels_name = "label_whole_fov0" instance_ids, instances_lazy = hp.tb.extract_instances( sdata, image_name=image_name, labels_name=labels_name, depth=100, diameter=40, remove_background=True, extract_mask=False, zarr_output_path=None, batch_size=64, to_coordinate_system="fov0", ) # compute instances now: instances_lazy.to_zarr( "instances.zarr" ) instances = da.from_zarr( "instances.zarr" )