harpy.tb.preprocess_proteomics#
- harpy.tb.preprocess_proteomics(sdata, labels_name, table_name, output_table_name, calculate_cell_size=True, size_norm=True, library_norm=False, log1p=True, scale=False, max_value_scale=10, q=None, max_value_q=1, calculate_pca=False, n_comps=50, instance_size_key='shapeSize', raw_counts_key='raw_counts', overwrite=False)#
Preprocess a table (AnnData) attribute of a SpatialData object for proteomics data.
Performs optional normalization (on size or via
normalize_total()), log transformation (log1p()), scaling (scale())/ quantile normalization and PCA calculation (pca()) for proteomics data contained insdata.- Parameters:
sdata (
SpatialData) – The input SpatialData object.labels_name (
str|Iterable[str]) – The labels element(s) ofsdataused to select the cells via the region key insdata.tables[table_name].obs. Note that ifoutput_table_nameis equal totable_nameand overwrite is True, cells insdata.tables[table_name]linked to otherlabels_name(via the region key), will be removed fromsdata.tables[table_name]. If a list of labels elements is provided, they will therefore be preprocessed together (e.g. multiple samples).table_name (
str) – The table element insdatato apply preprocessing to. It is an AnnData object containing total intensities per cell in.obs(rows) and per channel in.var(columns).output_table_name (
str) – The output table element insdatato which the preprocessed table will be written.calculate_cell_size (
bool(default:True)) – IfTrue, calculates cell sizes fromlabels_nameand stores them in.obs[instance_size_key]. Set this toFalsewhen cell sizes are not needed or are already present and should be preserved.size_norm (
bool(default:True)) – IfTrue, normalization is based on the size of the nucleus/cell. Resulting values are multiplied by 100 after normalization.library_norm (
bool(default:False)) – IfTrue,normalize_total()is used for normalization.log1p (
bool(default:True)) – IfTrue, applies log1p transformation to the data (vialog1p()), after optional normalization.scale (
bool(default:False)) – IfTrue, scales the data to have zero mean and a variance of one. The scaling is capped atmax_value_scale.max_value_scale (
float|None(default:10)) – The maximum value to which data will be scaled. Ignored ifscaleisFalse.q (
float|None(default:None)) – Quantile used for normalization. If specified, values are normalized by this quantile calculated for eachadata.var. Typical value used is 0.999. Resulting values are multiplied by 100 after normalization.max_value_q (
float|None(default:1)) – The maximum value to which data will be scaled when performing quantile normalization. Ignored ifqisNone. Typical value is 1. Resulting values are multiplied by 100 after normalization.calculate_pca (
bool(default:False)) – IfTrue, calculates principal component analysis (PCA) on the data.n_comps (
int(default:50)) – Number of principal components to calculate. Ignored ifcalculate_pcais False.instance_size_key (
str(default:'shapeSize')) – The key in theAnnDatatable.obsthat holds the size of the instances. Whensize_normisTrue, this column must either already exist in.obsorcalculate_cell_sizemust beTrue.raw_counts_key (
str(default:'raw_counts')) – Name of theAnnDatalayer where the non-preprocessed intensities will be stored. This parameter is ignored if no preprocessing is applied (i.e.size_norm,library_norm,log1p,scaleare allFalseand q isNone).overwrite (
bool(default:False)) – IfTrue, overwrites theoutput_table_nameif it already exists insdata.
- Return type:
- Returns:
: The
sdatacontaining the preprocessed AnnData object as an attribute (sdata.tables[output_table_name]).- Raises:
ValueError –
If
sdatadoes not contain any labels elements. - Ifsdatadoes not contain any table elements. - Iflabels_name, or one of the elements oflabels_name, is not a labels element insdata. - Iftable_nameis not a table element insdata. - If bothscaleis set to True andqis not None.
Warning
If
scaleis True andmax_value_scaleis set too low, it may overly constrain the variability of the data, potentially impacting downstream analyses.If the dimensionality of
sdata.tables[table_name]is smaller than the desired number of principal components whencalculate_pcais True,n_compsis set to the minimum dimensionality, and a message is printed.
See also
harpy.tb.allocate_intensitycreate an AnnData table in
sdatausing animage_nameand alabels_name.