harpy.tb.bin_counts

Contents

harpy.tb.bin_counts#

harpy.tb.bin_counts(sdata, table_name, labels_name, output_table_name, to_coordinate_system='global', chunks=10000, append=True, region_key='fov_labels', instance_key='cell_ID', spatial_key='spatial', cell_index_name='cells', overwrite=False)#

Bins gene counts from barcodes to cells or regions defined in labels_name and returns an updated SpatialData object with a table element (sdata.tables[output_table_name]) holding an AnnData object with the binned counts per cell or region.

Parameters:
  • sdata (SpatialData) – The SpatialData object.

  • table_name (str) – The table element holding the counts. E.g. obtained using harpy.io.visium_hd(). We assume that sdata[table_name].obsm[spatial_key] contains a numpy array holding the barcode coordinates (‘x’, ‘y’). The relation of sdata[table_name].obsm[spatial_key] to to_coordinate_system should be an identity transformation.

  • labels_name (str) – The labels element (e.g., segmentation mask, or a grid generated by harpy.im.add_grid_labels()) in sdata used to bin barcodes (as specified via table_name) into cells or regions.

  • output_table_name (str) – The table element in sdata in which to save the AnnData object with the binned counts per cell or region defined by labels_name.

  • to_coordinate_system (str (default: 'global')) – The coordinate system that holds labels_name.

  • chunks (str | tuple[int, ...] | int | None (default: 10000)) – Chunk sizes for processing. Can be a string, integer, or tuple of integers. Consider setting the chunks to a relatively high value to speed up processing, taking into account the available memory of your system.

  • append (bool (default: True)) – If set to True, and the labels_name does not yet exist as a region_key in sdata.tables[output_table_name].obs, the binned counts obtained during the current function call will be appended (along axis=0) to output_table_name. If False, and overwrite is set to True, any existing data in sdata.tables[output_table_name] will be overwritten by the newly binned counts.

  • instance_key (str (default: 'cell_ID')) – Instance key. The name of the column in AnnData table .obs that will hold the instance ids.

  • region_key (str (default: 'fov_labels')) – Region key. The name of the column in AnnData table .obs that will hold the name of the elements that is annotated by the resulting table.

  • spatial_key (str (default: 'spatial')) – The key in the AnnData table .obsm that will hold the x and y center of the instances. This center is calculated taking the average x,y coordinate of the assigned spots per bin/cell.

  • cell_index_name (str (default: 'cells')) – The name of the index of the resulting AnnData table.

  • overwrite (bool (default: False)) – If True, overwrites the output_table_name if it already exists in sdata.

Return type:

SpatialData

Returns:

: An updated SpatialData object with an AnnData table added to sdata.tables at slot output_table_name.

Example

import harpy as hp

sdata_bin = hp.datasets.visium_hd_example_custom_binning()

table_name_bins = "square_002um"
labels_name = (
    "square_labels_32"  # custom grid to bin the counts of table_name_bins; can be any segmentation mask
)
table_name = "table_custom_bin_32"
output_table_name = f"{table_name}_reproduce"

# Check that barcodes are unique in table_name_bins of sdata_bin
assert sdata_bin.tables[table_name_bins].obs.index.is_unique

sdata_bin = hp.tb.bin_counts(
    sdata_bin,
    table_name=table_name_bins,
    labels_name=labels_name,
    output_table_name=output_table_name,
    overwrite=True,
    append=False,
)