harpy.tb.leiden#
- harpy.tb.leiden(sdata, labels_name, table_name, output_table_name, calculate_umap=True, calculate_neighbors=True, rank_genes=True, n_neighbors=35, n_pcs=17, resolution=0.8, key_added='leiden', index_names_var=None, index_positions_var=None, random_state=100, overwrite=False, **kwargs)#
Applies leiden clustering on the
table_nameof the SpatialData object with optional UMAP calculation and gene ranking.This function executes the leiden clustering algorithm (via
sc.tl.leiden) on spatial data encapsulated by a SpatialData object. It optionally computes a UMAP (Uniform Manifold Approximation and Projection) for dimensionality reduction and ranks genes based on their contributions to the clustering. The clustering results, along with optional UMAP and gene ranking, are added to thesdata.tables[output_table_name]for downstream analysis.- Parameters:
sdata (
SpatialData) – The input SpatialData object.labels_name (
str|list[str] |None) – The labels element(s) ofsdataused to select the cells via the region key insdata.tables[table_name].obs. Note that ifoutput_table_nameis equal totable_nameandoverwriteisTrue, cells insdata.tables[table_name]linked to otherlabels_name(via the region key), will be removed fromsdata.tables[table_name]. If a list of labels elements is provided, they will therefore be clustered together (e.g. multiple samples).table_name (
str) – The table element insdataon which to perform clustering on.output_table_name (
str) – The output table element insdatato which table element with results of clustering will be written.calculate_umap (
bool(default:True)) – IfTrue, calculates a UMAP viaumap()for visualization of computed clusters.calculate_neighbors (
bool(default:True)) – IfTrue, calculates neighbors vianeighbors()required for leiden clustering. Set to False if neighbors are already calculated forsdata.tables[table_name].rank_genes (
bool(default:True)) – IfTrue, ranks genes based on their contributions to the clusters viarank_genes_groups()with default parameters. Note thatrank_genes_groups()will be run on the.rawattribute of theAnnDatatable, if.rawis notNone.n_neighbors (
int(default:35)) – The number of neighbors to consider when calculating neighbors vianeighbors(). Ignored ifcalculate_umapisFalse.n_pcs (
int(default:17)) – The number of principal components to use when calculating neighbors vianeighbors(). Ignored ifcalculate_umapisFalse.resolution (
float(default:0.8)) – Cluster resolution passed toleiden().key_added (
str(default:'leiden')) – The key under which the clustering results are added to the SpatialData object (insdata.tables[table_name].obs).index_names_var (
Iterable[str] |None(default:None)) – List of index names to subset insdata.tables[table_name].var.index_positions_varwill be used ifNone.index_positions_var (
Iterable[int] |None(default:None)) – List of integer positions to subset insdata.tables[table_name].var. Used ifindex_names_varisNone.random_state (
int(default:100)) – A random state for reproducibility of the clustering.overwrite (
bool(default:False)) – IfTrue, overwrites theoutput_table_nameif it already exists insdata.**kwargs – Additional keyword arguments passed to the leiden clustering algorithm (
sc.tl.leiden).
- Returns:
: The input
sdatawith the clustering results added.
Notes
The function updates the SpatialData object in-place, adding clustering labels, and optionally UMAP coordinates and gene rankings, facilitating downstream analyses and visualization.
Gene ranking based on cluster contributions is intended for identifying marker genes that characterize each cluster.
Warning
The function is intended for use with spatial omics data. Input data should be appropriately preprocessed (e.g. via
preprocess_transcriptomics()orpreprocess_proteomics()) to ensure meaningful clustering results.The
rank_genesfunctionality is marked for relocation to enhance modularity and clarity of the codebase.
See also
harpy.tb.preprocess_transcriptomicspreprocess transcriptomics data.
harpy.tb.preprocess_proteomicspreprocess proteomics data.