harpy.tb.leiden

Contents

harpy.tb.leiden#

harpy.tb.leiden(sdata, labels_name, table_name, output_table_name, calculate_umap=True, calculate_neighbors=True, rank_genes=True, n_neighbors=35, n_pcs=17, resolution=0.8, key_added='leiden', index_names_var=None, index_positions_var=None, random_state=100, overwrite=False, **kwargs)#

Applies leiden clustering on the table_name of the SpatialData object with optional UMAP calculation and gene ranking.

This function executes the leiden clustering algorithm (via sc.tl.leiden) on spatial data encapsulated by a SpatialData object. It optionally computes a UMAP (Uniform Manifold Approximation and Projection) for dimensionality reduction and ranks genes based on their contributions to the clustering. The clustering results, along with optional UMAP and gene ranking, are added to the sdata.tables[output_table_name] for downstream analysis.

Parameters:
  • sdata (SpatialData) – The input SpatialData object.

  • labels_name (str | list[str] | None) – The labels element(s) of sdata used to select the cells via the region key in sdata.tables[table_name].obs. Note that if output_table_name is equal to table_name and overwrite is True, cells in sdata.tables[table_name] linked to other labels_name (via the region key), will be removed from sdata.tables[table_name]. If a list of labels elements is provided, they will therefore be clustered together (e.g. multiple samples).

  • table_name (str) – The table element in sdata on which to perform clustering on.

  • output_table_name (str) – The output table element in sdata to which table element with results of clustering will be written.

  • calculate_umap (bool (default: True)) – If True, calculates a UMAP via umap() for visualization of computed clusters.

  • calculate_neighbors (bool (default: True)) – If True, calculates neighbors via neighbors() required for leiden clustering. Set to False if neighbors are already calculated for sdata.tables[table_name].

  • rank_genes (bool (default: True)) – If True, ranks genes based on their contributions to the clusters via rank_genes_groups() with default parameters. Note that rank_genes_groups() will be run on the .raw attribute of the AnnData table, if .raw is not None.

  • n_neighbors (int (default: 35)) – The number of neighbors to consider when calculating neighbors via neighbors(). Ignored if calculate_umap is False.

  • n_pcs (int (default: 17)) – The number of principal components to use when calculating neighbors via neighbors(). Ignored if calculate_umap is False.

  • resolution (float (default: 0.8)) – Cluster resolution passed to leiden().

  • key_added (str (default: 'leiden')) – The key under which the clustering results are added to the SpatialData object (in sdata.tables[table_name].obs).

  • index_names_var (Iterable[str] | None (default: None)) – List of index names to subset in sdata.tables[table_name].var. index_positions_var will be used if None.

  • index_positions_var (Iterable[int] | None (default: None)) – List of integer positions to subset in sdata.tables[table_name].var. Used if index_names_var is None.

  • random_state (int (default: 100)) – A random state for reproducibility of the clustering.

  • overwrite (bool (default: False)) – If True, overwrites the output_table_name if it already exists in sdata.

  • **kwargs – Additional keyword arguments passed to the leiden clustering algorithm (sc.tl.leiden).

Returns:

: The input sdata with the clustering results added.

Notes

  • The function updates the SpatialData object in-place, adding clustering labels, and optionally UMAP coordinates and gene rankings, facilitating downstream analyses and visualization.

  • Gene ranking based on cluster contributions is intended for identifying marker genes that characterize each cluster.

Warning

  • The function is intended for use with spatial omics data. Input data should be appropriately preprocessed (e.g. via preprocess_transcriptomics() or preprocess_proteomics()) to ensure meaningful clustering results.

  • The rank_genes functionality is marked for relocation to enhance modularity and clarity of the codebase.

See also

harpy.tb.preprocess_transcriptomics

preprocess transcriptomics data.

harpy.tb.preprocess_proteomics

preprocess proteomics data.