Benchmark results Harpy#

Below, we present figures illustrating the benchmark performance of Harpy.

We evaluate two representative workloads: (i) shallow feature extraction from multiplex images, defined here as the computation of mean intensities per instance, and (ii) the generation of single-cell representations from multiplex images.

The benchmarks were performed on a tonsil proteomics dataset acquired using the MACSima platform, and on a dataset acquired using the Merscope platform.

Both datasets were artificially enlarged: the tonsil dataset was tiled to 40 gigapixels, while the Merscope dataset was expanded to 1 terapixel. This was done to stress-test Harpy on two computationally demanding tasks typically run before downstream analysis.

The figures shown correspond to CPU-based execution. Note that Harpy also supports GPU acceleration via cupy , which can provide additional speedups for large datasets.

Next, we focus on the gigapixel case (the 40‑gigapixel tonsil dataset) before moving to the terapixel benchmark.

Gigapixel case#

Read the data:#

import os
import tempfile

from spatialdata import read_zarr
import harpy as hp

sdata = hp.datasets.macsima_tonsil_benchmark()  # note this dataset is large (~100GB)
sdata.write(os.path.join(tempfile.gettempdir(), "sdata.zarr"), overwrite=True)
sdata = read_zarr(sdata.path)
sdata
WARNING  Module 'bioio' is not installed. Install it with `pip install bioio` to use `harpy.io.macsima`.           
WARNING  Module 'bioio-ome-tiff' is not installed. Install it with `pip install bioio-ome-tiff` to use             
         `harpy.io.macsima`.
SpatialData object
├── Images
│     ├── 'image_tiled': DataArray[cyx] (100, 20000, 20000)
│     └── 'image_tiled_dapi': DataTree[cyx] (1, 20000, 20000), (1, 10000, 10000), (1, 5000, 5000), (1, 2500, 2500), (1, 1250, 1250)
├── Labels
│     └── 'labels_cells_harpy': DataArray[yx] (20000, 20000)
└── Shapes
      └── 'segmentation_mask_boundaries': GeoDataFrame shape: (206030, 1) (2D shapes)
with coordinate systems:
    ▸ 'global', with elements:
        image_tiled (Images), image_tiled_dapi (Images), labels_cells_harpy (Labels), segmentation_mask_boundaries (Shapes)

This dataset contains 206,030 instances:

import dask.array as da

da.unique(sdata["labels_cells_harpy"].data).compute().shape  # the dataset contains 206030 instances
(206031,)

The image element under the key “image_tiled” was obtained by artificially tiling the “ROI1_image” output of harpy.datasets.macsima_tonsil() into a 3D image layout with dimensions (c, y, x) = (100, 20000, 20000) ~40 giga-pixels.

The cell segmentation mask stored under the key “labels_cells_harpy” was generated using Cellpose (version 4) applied to the “image_tiled” image element.

Benchmark shallow feature extraction:#

from dask.distributed import Client, LocalCluster
from harpy.utils._aggregate import RasterAggregator

cluster = LocalCluster(
    n_workers=8,
    threads_per_worker=4,
    memory_limit="48GB",
)

client = Client(cluster)

image_name = "image_tiled"
labels_name = "labels_cells_harpy"

image = sdata[image_name].data[:, None, ...]  # ( "c", "z", "y", "x" )
labels = sdata[labels_name].data[None, ...]  # ( "z", "y", "x" )

aggregator = RasterAggregator(image_dask_array=image, mask_dask_array=labels)
dfs = aggregator.aggregate_stats(stats_funcs=("mean"))

client.close()

We varied the number of workers and obtained the following figures.

from IPython.display import Image, display

display(Image(filename="../figures/output_aggregate_mean.png"))
../../_images/46967a66d67ead7814ab2435fb61af53ae4913e24030a7c7718809c5fe2ab22f.png

Benchmark generation of single-cell representations from highly multiplexed images:#

import os

from dask.distributed import Client, LocalCluster

from tempfile import tempdir

cluster = LocalCluster(
    n_workers=8,
    threads_per_worker=4,
    memory_limit="48GB",
)

client = Client(cluster)

image_name = "image_tiled"
labels_name = "labels_cells_harpy"

instances_ids, instances = hp.tb.extract_instances(
    sdata,
    image_name=image_name,
    labels_name=labels_name,
    depth=100,
    diameter=50,
    batch_size=500,
    zarr_output_path=os.path.join(tempdir, "zarr_folder"),
    overwrite=True,
)

We varied the number of workers and obtained the following figures.

from IPython.display import Image, display

display(Image(filename="../figures/extract_instances.png"))
../../_images/7b90be98b82e33f4d25ed1796cad6418018f1c5df76bed60f0d43e676f54ceb2.png

Terapixel case#

Read the data:#

import os
import tempfile

from spatialdata import read_zarr
import harpy as hp

sdata = hp.datasets.merscope_mouse_liver_segmentation_mask()
sdata.write(os.path.join(tempfile.gettempdir(), "sdata.zarr"), overwrite=True)
sdata = read_zarr(sdata.path)

# artificially tile the data:

sdata = hp.im.add_image(
    sdata,
    arr=da.tile(sdata["clahe"], (50, 1, 1)).rechunk((10, 2048, 2048)),  # Note this array is large ~1.5TB
    output_image_name="clahe_tiled",
    overwrite=True,
)

The image element under the key “clahe_tiled” now has dimensions (c, y, x) = (100, 100000, 100000), totaling 1 terapixel (10¹² pixels).

The cell segmentation mask stored under the key “segmentation_mask_full” was generated using Cellpose (version 3) applied to the “clahe” image element. The cell segmentation mask contains 278,482 instances.

Benchmark shallow feature extraction:#

from dask.distributed import Client, LocalCluster
from harpy.utils._aggregate import RasterAggregator

cluster = LocalCluster(
    n_workers=8,
    threads_per_worker=4,
    memory_limit="243200MB",
    local_directory=os.environ.get("SCRATCHDIR") or os.environ.get("TMPDIR"),
)

client = Client(cluster)

image_name = "segmentation_mask_full"
labels_name = "clahe_tiled"

image = sdata[image_name].data[:, None, ...]  # ( "c", "z", "y", "x" )
labels = sdata[labels_name].data[None, ...]  # ( "z", "y", "x" )

aggregator = RasterAggregator(image_dask_array=image, mask_dask_array=labels)
dfs = aggregator.aggregate_stats(stats_funcs=("mean"))

client.close()

We varied the number of workers and obtained the following figures.

from IPython.display import Image, display

display(Image(filename="../figures/output_aggregate_mean_merscope.png"))
../../_images/07bcc73e3993a178a38fc47d0b56cb1f4f76a042b1ba105603df909aa6ab79b5.png

RSS (and VSZ) appears to spike at 1 worker, likely due to Dask holding onto chunks when parallelism is low.

Benchmark generation of single-cell representations from highly multiplexed images:#

import os

from dask.distributed import Client, LocalCluster

from tempfile import tempdir

cluster = LocalCluster(
    n_workers=8,
    threads_per_worker=4,
    memory_limit="243200MB",
    local_directory=os.environ.get("SCRATCHDIR") or os.environ.get("TMPDIR"),
)

client = Client(cluster)

image_name = "image_tiled"
labels_name = "labels_cells_harpy"

instances_ids, instances = hp.tb.extract_instances(
    sdata,
    image_name=image_name,
    labels_name=labels_name,
    depth=200,
    diameter=100,
    batch_size=500,
    zarr_output_path=os.path.join(tempdir, "zarr_folder"),
    overwrite=True,
)

We varied the number of workers and obtained the following figures.

from IPython.display import Image, display

display(Image(filename="../figures/extract_instances_merscope.png"))
../../_images/1309c311043aab2c704f04a91137e67e7f9e3aad6c4a844234af6879b5a75101.png