harpy.io.read_transcripts#
- harpy.io.read_transcripts(sdata, path_count_matrix, transform_matrix=None, pixel_size=None, output_points_name='transcripts', overwrite=False, column_x=0, column_y=1, column_z=None, column_gene=3, column_midcount=None, delimiter=',', header=None, comment=None, crd=None, to_coordinate_system='global', to_micron_coordinate_system=None, filter_gene_names=None, blocksize='64MB')#
Reads transcript information from a file with each row listing the x and y coordinates, along with the gene name.
If a transform matrix is provided an affine transformation from micron to pixels is applied to the coordinates of the transcripts. The transformation is applied to the dask dataframe before adding it to
sdata. The SpatialData object is augmented with a points element namedoutput_points_namethat contains the transcripts.- Parameters:
sdata (
SpatialData) – The SpatialData object to which the transcripts will be added.path_count_matrix (
str|Path) – Path to a.parquetfile or.csvfile containing the transcripts information. Each row should contain anx(column_x),y(column_y) coordinate and a gene name (column_gene). Optional a count column (seecolumn_midcount) is provided.transform_matrix (
str|Path|ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] |None(default:None)) –This
numpyarray should contain a 3x3 transformation matrix for the affine transformation from micron to pixels. The matrix defines the linear transformation to be applied to thexandycoordinates of the transcripts before adding it as a points element tosdatain the coordinate systemto_coordinate_system. E.g.to_coordinate_systemwill be the intrinsic pixel coordinate system.Example transform matrix:
Sx 0 Tx
0 Sy Ty
0 0 1
If no transform matrix is specified,
transform_matrixwill default to the identity matrix, and we assume transcripts coordinates are already in pixels. Iftransform_matrixis specified as a path to a file, it will be read vianumpy.genfromtxt. Ifto_micron_coordinate_systemis specifed andtransform_matrixis not the identity matrix, a micron coordinate system will be added atto_micron_coordinate_system, with the inverse of thetransform_matrixdefined on it (inverse == transformation from pixels to micron).pixel_size (
int|None(default:None)) – Size of the pixels in micron. This can only be specified if ‘transform_matrix’ is equal to the identity matrix or ‘None’ (i.e. transcripts are already in pixels). If ‘to_micron_coordinate_system’ is specified, a micron coordinate system will be added, otherwise this parameter will be ignored.output_points_name (
str(default:'transcripts')) – Name of the points element of the SpatialData object to which the transcripts will be added.overwrite (
bool(default:False)) – If True overwrites theoutput_points_name(a points element) if it already exists.column_x (
int(default:0)) – Column index of the X coordinate in the count matrix.column_y (
int(default:1)) – Column index of the Y coordinate in the count matrix.column_z (
int|None(default:None)) – Column index of the Z coordinate in the count matrix.column_gene (
int(default:3)) – Column index of the gene information in the count matrix.column_midcount (
int|None(default:None)) – Specifies the column index that contains the count of how many times the gene is detected at that particular location. Ignored when set to None.delimiter (
str(default:',')) – Delimiter used to separate values in the.csvfile. Ignored ifpath_count_matrixis a.parquetfile.header (
int|None(default:None)) – Row number to use as the header in the.csvfile. IfNone, no header is used. Ignored ifpath_count_matrixis a.parquetfile.comment (
str|None(default:None)) – Character indicating that the remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character. Ignored ifpath_count_matrixis a.parquetfile.crd (
tuple[int,int,int,int] |None(default:None)) – The coordinates (in pixels) for the region of interest in the format (xmin, xmax, ymin, ymax). IfNone, all transcripts are considered.to_coordinate_system (
str(default:'global')) – Intrinsic pixel coordinate system to whichoutput_points_namewill be added. This is the pixel coordinate system.to_micron_coordinate_system (
str|None(default:None)) – Micron coordinate system to whichoutput_points_namewill be added, if ‘transform_matrix’ is not the identity matrix, or ‘pixel_size’ is not ‘None’. This is the micron coordinate system.filter_gene_names (
str|list[str] |None(default:None)) – Gene names that need to be filtered out (viastr.contains), mostly control genes that were added, and which you don’t want to use. Filtering is case insensitive.blocksize (
str(default:'64MB')) – Block size of the partions of the dask dataframe stored aspoints_nameinsdata.
- Raises:
ValueError – If
pixel_sizeis notNone, andtransform_matrixis not equal to the identity matrix.- Return type:
- Returns:
: The updated SpatialData object containing the transcripts.