harpy.qc.analyse_genes_left_out#
- harpy.qc.analyse_genes_left_out(sdata, labels_name, table_name, points_name='transcripts', to_coordinate_system='global', name_x='x', name_y='y', name_gene_column='gene', output=None)#
Analyse and visualize the proportion of genes that could not be assigned to an instance during allocation step.
- Parameters:
sdata (
SpatialData) – Data containing spatial information for plotting.labels_name (
str) – The labels element insdatathat contains the segmentation masks. This labels element is used to calculate the crd (region of interest) that was used in the segmentation step, otherwise transcript counts inpoints_nameofsdata(containing all transcripts) and the counts obtained viasdata.tables[ table_name ]are not comparable. It is also used to select the cells insdata.tables[table_name]that are linked to thislabels_namevia the region key.table_name (
str) – The table element insdataon which to perform analysis.points_name (
str(default:'transcripts')) – The points element insdatacontaining transcript information.to_coordinate_system (
str(default:'global')) – The coordinate system that holdslabels_nameandpoints_name. This should be the intrinsic coordinate system in pixels.name_x (
str(default:'x')) – The column name representing the x-coordinate inpoints_name.name_y (
str(default:'y')) – The column name representing the y-coordinate inpoints_name.name_gene_column (
str(default:'gene')) – The column name representing the gene name inpoints_name.output (
str|Path|None(default:None)) – The path to save the generated plots. If None, plots will be shown directly using plt.show().
- Return type:
DataFrame- Returns:
:
pandas.DataFramecontaining information about the proportion of transcripts kept for each gene, raw counts (i.e. obtained frompoints_nameofsdata), and the log of raw counts.- Raises:
AttributeError – If the provided
sdatadoes not contain the necessary attributes (i.e., ‘labels’ or ‘points’).
Notes
- This function produces two plots:
A scatter plot of the log of raw gene counts vs. the proportion of transcripts kept.
A regression plot for the same data with Pearson correlation coefficients.
The function also prints the ten genes with the highest proportion of transcripts filtered out.
See also
Examples
import harpy as hp sdata = hp.datasets.xenium_human_ovarian_cancer(subset=True) hp.qc.analyse_genes_left_out( sdata, labels_name="cell_labels_global", points_name="transcripts_global", table_name="table_global", )