harpy.im.match_labels_to_reference#
- harpy.im.match_labels_to_reference(sdata, source_labels_name, reference_labels_name, chunks=None, threshold=0.0, overlap_metric='source_fraction')#
Match source labels to reference labels based on an overlap score.
For each non-zero label in
source_labels_name, this function determines, for every labels element inreference_labels_name, which non-zero reference label best matches it according tooverlap_metric. The result is returned as aDataFrameindexed by the source labels, with one column per reference labels element.With the default parameters
threshold=0andoverlap_metric="source_fraction", the function effectively assigns each source label to the reference label with the largest non-zero overlap.Overlap counts are accumulated chunk by chunk using a local dense overlap table per chunk pair and a sparse global accumulator across chunks. This keeps the implementation suitable for large label images without requiring a dense global
(n_source_labels, n_reference_labels)overlap matrix.- Parameters:
sdata (
SpatialData) – The input SpatialData object containing the source labels element and the reference labels elements.source_labels_name (
str) – Name of the labels element whose non-zero labels are matched to the reference labels elements.reference_labels_name (
list[str]) – Names of the reference labels elements against which overlap is computed. One output column is produced for each element name in the order provided.chunks (
str|int|tuple[int,int] |None(default:None)) – Chunk specification used when rechunking the label arrays before the overlap computation. If a tuple is provided, it is interpreted as the desired(y, x)chunk size. If set to"auto", Dask determines the chunking. Smaller spatial chunks can improve performance by reducing the size of the per-chunk overlap tables.threshold (
float(default:0.0)) – Minimum required overlap fraction between a source label and its best-matching reference label. The overlap fraction is computed as a score controlled byoverlap_metric. If this score is not strictly greater thanthreshold, the mapping is discarded and the output value is set to0. Must lie between 0 and 1.overlap_metric (
Literal['source_fraction','reference_fraction','iou'] (default:'source_fraction')) –Metric used both to select the winning reference label and to apply
thresholdto that winning match. Supported values are:"source_fraction":overlap_pixels / area_source_label"reference_fraction":overlap_pixels / area_reference_label"iou":overlap_pixels / (area_source_label + area_reference_label - overlap_pixels)
- Return type:
DataFrame- Returns:
: A pandas DataFrame where each row corresponds to a non-zero label from
source_labels_nameand each column corresponds to one element name inreference_labels_name. Every value contains the non-zero reference label selected for that source label according tooverlap_metric. If a source label has no non-zero overlap with a given reference labels element, the corresponding output value is0.- Raises:
AssertionError – If the provided labels elements do not all have the same shape.
AssertionError – If
chunksis provided as a tuple but does not match the(y, x)dimensions.AssertionError – If any rechunked array has more than one chunk along the z dimension.
ValueError – If
thresholdis outside the interval[0, 1].ValueError – If
overlap_metricis not one of"source_fraction","reference_fraction", or"iou".
Notes
Background label
0is ignored when computing overlaps. As a result, output value0indicates that a source label has no non-zero overlap with the corresponding reference labels element.Example
sdata = hp.datasets.mibi_example() matched = hp.im.match_labels_to_reference( sdata, source_labels_name="masks_whole", reference_labels_name=["masks_nuclear"], chunks=256, )