harpy.io.merscope

Contents

harpy.io.merscope#

harpy.io.merscope(path, to_coordinate_system='global', z_layers=3, backend=None, transcripts=True, cell_boundaries=True, rasterize_cell_boundaries=True, table=True, mosaic_images=True, do_3D=False, z_projection=False, imread_kwargs=mappingproxy({}), image_models_kwargs=mappingproxy({}), filter_gene_names=None, instance_key='cell_ID', region_key='fov_labels', spatial_key='spatial', cell_index_name='cells', output=None)#

Read MERSCOPE data from Vizgen.

A wrapper around spatialdata_io.merscope() that adds some additional capabilities: (i) loading data in full 3D (z, y, x) or applying a z-projection, (ii) loading both the unprocessed and processed AnnData table (with leiden clusters) provided by Vizgen, (iii) rasterizing cell boundaries, (iv) adding a micron-based coordinate system, and (v) loading multiple samples into a single SpatialData object.

The micron coordinate system is added as ‘{to_coordinate_system}_micron’ and is available to all spatial elements within the resulting SpatialData object.

This function reads the following files:

  • {ms.TRANSCRIPTS_FILE!r}: Transcript file.

  • mosaic_**_z*.tif images inside the {ms.IMAGES_DIR!r} directory.

Parameters:
  • path (str | Path | list[str] | list[Path]) – Path to the region/root directory containing the Merscope files (e.g., detected_transcripts.csv). This can either be a single path or a list of paths, where each path corresponds to a different experiment/roi.

  • to_coordinate_system (str | list[str] (default: 'global')) – The coordinate system to which the elements will be added for each item in path. If provided as a list, its length should be equal to the number of paths specified in path. A micron coordinate system will be added at ‘{to_coordinate_system}_micron’.

  • z_layers (int | list[int] | None (default: 3)) – Indices of the z-layers to consider. Either one int index, or a list of int indices. If None, then no image is loaded. By default, only the middle layer is considered (that is, layer 3).

  • backend (Optional[Literal['dask_image', 'rioxarray']] (default: None)) – Either "dask_image" or "rioxarray" (the latter uses less RAM, but requires rioxarray to be installed). By default, uses "rioxarray" if and only if the rioxarray library is installed.

  • transcripts (bool (default: True)) – Whether to read transcripts.

  • cell_boundaries (bool (default: True)) – Whether to read cell boundaries (polygons).

  • rasterize_cell_boundaries (bool (default: True)) – Whether to rasterize the cell boundaries (i.e. create a labels element from polygons). We use harpy.im.rasterize() to rasterize the cell boundaries. Ignored if cell_boundaries is False, or if mosaic_images is False.

  • table (bool (default: True)) – Whether to read in the AnnData table. The table will be annotated by a labels element. If table is set to True then cell_boundaries, rasterize_cell_boundaries and mosaic_images must also be set to True.

  • mosaic_images (bool (default: True)) – Whether to read the mosaic images.

  • do_3D (bool (default: False)) – Read the mosaic images and the transcripts as 3D.

  • z_projection (bool (default: False)) – Perform a z projection (maximum intensity along the z-stacks) on z-stacks of mosaic images. Ignored if mosaic_images is False.

  • imread_kwargs (Mapping[str, Any] (default: mappingproxy({}))) – Keyword arguments to pass to the image reader. Ignored if mosaic_images is False.

  • image_models_kwargs (Mapping[str, Any] (default: mappingproxy({}))) – Keyword arguments to pass to the image models. Ignored if mosaic_images is False.

  • filter_gene_names (str | list[str] (default: None)) – Gene names that need to be filtered out (via str.contains) from the resulting points element (transcripts), mostly control genes that were added, and which you don’t want to use. Filtering is case insensitive. Also see harpy.read_transcripts(). Ignored if transcripts is False.

  • instance_key (str (default: 'cell_ID')) – Instance key. The name of the column in AnnData table .obs that will hold the instance ids. Ignored if table is False.

  • region_key (str (default: 'fov_labels')) – Region key. The name of the column in AnnData table .obs that will hold the name of the elements that are annotated by the table. Ignored if table is False.

  • spatial_key (str (default: 'spatial')) – The key in the AnnData table .obsm that will hold the x and y center of the instances. Ignored if table is False.

  • cell_index_name (str (default: 'cells')) – The name of the index of the resulting AnnData table. Ignored if table is False.

  • output (str | Path | None (default: None)) – The path where the resulting SpatialData object will be backed. If None, it will not be backed to a zarr store.

Raises:
  • AssertionError – If the number of elements in path and to_coordinate_system are not the same.

  • AssertionError – If elements in to_coordinate_system are not unique.

  • ValueError – If both do_3D and z_projection are set to True.

  • ValueError – If table is True, and rasterize_cell_boundaries or mosaic_images is not True.

  • ValueError – If table is True and cell_boundaries is False.

Return type:

SpatialData

Returns:

: A SpatialData object.

See also

harpy.io.read_transcripts

read transcripts.

harpy.im.rasterize

rasterize cell boundaries.