harpy.im.match_labels_to_reference

harpy.im.match_labels_to_reference#

harpy.im.match_labels_to_reference(sdata, source_labels_name, reference_labels_name, chunks=None, threshold=0.0, overlap_metric='source_fraction')#

Match source labels to reference labels based on an overlap score.

For each non-zero label in source_labels_name, this function determines, for every labels element in reference_labels_name, which non-zero reference label best matches it according to overlap_metric. The result is returned as a DataFrame indexed by the source labels, with one column per reference labels element.

With the default parameters threshold=0 and overlap_metric="source_fraction", the function effectively assigns each source label to the reference label with the largest non-zero overlap.

Overlap counts are accumulated chunk by chunk using a local dense overlap table per chunk pair and a sparse global accumulator across chunks. This keeps the implementation suitable for large label images without requiring a dense global (n_source_labels, n_reference_labels) overlap matrix.

Parameters:
  • sdata (SpatialData) – The input SpatialData object containing the source labels element and the reference labels elements.

  • source_labels_name (str) – Name of the labels element whose non-zero labels are matched to the reference labels elements.

  • reference_labels_name (list[str]) – Names of the reference labels elements against which overlap is computed. One output column is produced for each element name in the order provided.

  • chunks (str | int | tuple[int, int] | None (default: None)) – Chunk specification used when rechunking the label arrays before the overlap computation. If a tuple is provided, it is interpreted as the desired (y, x) chunk size. If set to "auto", Dask determines the chunking. Smaller spatial chunks can improve performance by reducing the size of the per-chunk overlap tables.

  • threshold (float (default: 0.0)) – Minimum required overlap fraction between a source label and its best-matching reference label. The overlap fraction is computed as a score controlled by overlap_metric. If this score is not strictly greater than threshold, the mapping is discarded and the output value is set to 0. Must lie between 0 and 1.

  • overlap_metric (Literal['source_fraction', 'reference_fraction', 'iou'] (default: 'source_fraction')) –

    Metric used both to select the winning reference label and to apply threshold to that winning match. Supported values are:

    • "source_fraction": overlap_pixels / area_source_label

    • "reference_fraction": overlap_pixels / area_reference_label

    • "iou": overlap_pixels / (area_source_label + area_reference_label - overlap_pixels)

Return type:

DataFrame

Returns:

: A pandas DataFrame where each row corresponds to a non-zero label from source_labels_name and each column corresponds to one element name in reference_labels_name. Every value contains the non-zero reference label selected for that source label according to overlap_metric. If a source label has no non-zero overlap with a given reference labels element, the corresponding output value is 0.

Raises:
  • AssertionError – If the provided labels elements do not all have the same shape.

  • AssertionError – If chunks is provided as a tuple but does not match the (y, x) dimensions.

  • AssertionError – If any rechunked array has more than one chunk along the z dimension.

  • ValueError – If threshold is outside the interval [0, 1].

  • ValueError – If overlap_metric is not one of "source_fraction", "reference_fraction", or "iou".

Notes

Background label 0 is ignored when computing overlaps. As a result, output value 0 indicates that a source label has no non-zero overlap with the corresponding reference labels element.

Example

sdata = hp.datasets.mibi_example()

matched = hp.im.match_labels_to_reference(
    sdata,
    source_labels_name="masks_whole",
    reference_labels_name=["masks_nuclear"],
    chunks=256,
)