harpy.qc.analyse_genes_left_out

harpy.qc.analyse_genes_left_out#

harpy.qc.analyse_genes_left_out(sdata, labels_name, table_name, points_name='transcripts', to_coordinate_system='global', name_x='x', name_y='y', name_gene_column='gene', output=None)#

Analyse and visualize the proportion of genes that could not be assigned to an instance during allocation step.

Parameters:
  • sdata (SpatialData) – Data containing spatial information for plotting.

  • labels_name (str) – The labels element in sdata that contains the segmentation masks. This labels element is used to calculate the crd (region of interest) that was used in the segmentation step, otherwise transcript counts in points_name of sdata (containing all transcripts) and the counts obtained via sdata.tables[ table_name ] are not comparable. It is also used to select the cells in sdata.tables[table_name] that are linked to this labels_name via the region key.

  • table_name (str) – The table element in sdata on which to perform analysis.

  • points_name (str (default: 'transcripts')) – The points element in sdata containing transcript information.

  • to_coordinate_system (str (default: 'global')) – The coordinate system that holds labels_name and points_name. This should be the intrinsic coordinate system in pixels.

  • name_x (str (default: 'x')) – The column name representing the x-coordinate in points_name.

  • name_y (str (default: 'y')) – The column name representing the y-coordinate in points_name.

  • name_gene_column (str (default: 'gene')) – The column name representing the gene name in points_name.

  • output (str | Path | None (default: None)) – The path to save the generated plots. If None, plots will be shown directly using plt.show().

Return type:

DataFrame

Returns:

: pandas.DataFrame containing information about the proportion of transcripts kept for each gene, raw counts (i.e. obtained from points_name of sdata), and the log of raw counts.

Raises:

AttributeError – If the provided sdata does not contain the necessary attributes (i.e., ‘labels’ or ‘points’).

Notes

This function produces two plots:
  • A scatter plot of the log of raw gene counts vs. the proportion of transcripts kept.

  • A regression plot for the same data with Pearson correlation coefficients.

The function also prints the ten genes with the highest proportion of transcripts filtered out.

Examples

import harpy as hp

sdata = hp.datasets.xenium_human_ovarian_cancer(subset=True)
hp.qc.analyse_genes_left_out(
    sdata,
    labels_name="cell_labels_global",
    points_name="transcripts_global",
    table_name="table_global",
)