harpy.tb.ZarrIterableInstances#
- class harpy.tb.ZarrIterableInstances(zarr_path, instance_ids, labels=None, chunk_i=None, shuffle_chunks=False, chunk_seed=0, shuffle_within_chunk=False, buffer_seed=None, normalize='minmax01', x_dtype=torch.float32, return_instance_id=True, return_row_index=True, drop_unlabeled=True, allowed_chunk_indexes=None)#
Chunk-wise iterable dataset that:
iterates chunk-by-chunk
optionally shuffles chunks deterministically per epoch
partitions chunks across DDP ranks and dataloader workers
supervised training (labels provided): yields (x, y) with optional instance_id/row_idx
inference (no labels): yields x with optional instance_id/row_idx
Inspired by https://gitlab.in2p3.fr/ipsl/espri/espri-ia/projects/zarr-torch-dataset
- Parameters:
zarr_path (
str) – Path to a Zarr array of shape(i, c, z, y, x)whereiindexes instances.instance_ids (
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]) – One-dimensional array of lengthiwith the instance identifier per row in the Zarr array.labels (
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] |None(default:None)) – Optional one-dimensional array of lengthiwith supervised labels. IfNone, the dataset operates in inference mode.chunk_i (
int|None(default:None)) – Chunk size along the instance axis. IfNone, uses the Zarr chunk size for axis 0.shuffle_chunks (
bool(default:False)) – IfTrue, shuffle chunk order deterministically bychunk_seed.chunk_seed (
int(default:0)) – Base seed used for shuffling chunks. Combined with epoch whenshuffle_chunksisTrue.shuffle_within_chunk (
bool(default:False)) – IfTrue, shuffle rows within each chunk.buffer_seed (
int|None(default:None)) – Base seed for within-chunk shuffling. IfNone, within-chunk shuffling is non-deterministic.normalize (
str(default:'minmax01')) – Normalization mode for each instance tensor. Supported values are"minmax01","unit",Noneor"none".x_dtype (
dtype(default:torch.float32)) – Output dtype for returned tensors.return_instance_id (
bool(default:True)) – IfTrue, append the instance id to each output tuple.return_row_index (
bool(default:True)) – IfTrue, append the global row index (=instance id index) to each output tuple.drop_unlabeled (
bool(default:True)) – IfTrueandlabelsis provided, drop rows with label < 0.allowed_chunk_indexes (
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] (default:None)) – Optional subset of chunk indices to iterate, e.g. for train/test splits.
See also
harpy.tb.extract_instancesFunction to create the Zarr instances consumed by this dataset.
Methods table#
|
Call this once per epoch. |
Methods#
- ZarrIterableInstances.set_epoch(epoch)#
Call this once per epoch.
- Return type:
None