harpy.tb.kmeans#
- harpy.tb.kmeans(sdata, labels_name, table_name, output_table_name, calculate_umap=True, rank_genes=True, n_neighbors=35, n_pcs=17, n_clusters=5, key_added='kmeans', index_names_var=None, index_positions_var=None, random_state=100, overwrite=False, **kwargs)#
Applies KMeans clustering on the
table_nameof the SpatialData object with optional UMAP calculation and gene ranking.This function executes the KMeans clustering algorithm (via
KMeans) on spatial data encapsulated by a SpatialData object. It optionally computes a UMAP (Uniform Manifold Approximation and Projection) for dimensionality reduction and ranks genes based on their contributions to the clustering. The clustering results, along with optional UMAP and gene ranking, are added to thesdata.tables[output_table_name]for downstream analysis.- Parameters:
sdata (
SpatialData) – The input SpatialData object.labels_name (
str|list[str] |None) – The labels element(s) ofsdataused to select the cells via the region key insdata.tables[table_name].obs. Note that ifoutput_table_nameis equal totable_nameand overwrite is True, cells insdata.tables[table_name]linked to otherlabels_name(via the region key), will be removed fromsdata.tables[table_name]. If a list of labels elements is provided, they will therefore be clustered together (e.g. multiple samples).table_name (
str) – The table element insdataon which to perform clustering.output_table_name (
str) – The output table element insdatato which table element with results of clustering will be written.calculate_umap (
bool(default:True)) – IfTrue, calculates a UMAP viaumap()for visualization of computed clusters.rank_genes (
bool(default:True)) – IfTrue, ranks genes based on their contributions to the clusters viarank_genes_groups(), with default parameters. Note thatrank_genes_groups()will be run on the.rawattribute of theAnnDatatable, if.rawis notNone.n_neighbors (
int(default:35)) – The number of neighbors to consider when calculating neighbors vianeighbors(). Ignored ifcalculate_umapis False.n_pcs (
int(default:17)) – The number of principal components to use when calculating neighbors vianeighbors(). Ignored ifcalculate_umapis False.n_clusters (
int(default:5)) – The number of clusters to form.key_added (default:
'kmeans') – The key under which the clustering results are added to the SpatialData object (insdata.tables[table_name].obs).index_names_var (
Iterable[str] |None(default:None)) – List of index names to subset insdata.tables[table_name].var.index_positions_varwill be used ifNone.index_positions_var (
Iterable[int] |None(default:None)) – List of integer positions to subset insdata.tables[table_name].var. Used ifindex_names_varis None.random_state (
int(default:100)) – A random state for reproducibility of the clustering.overwrite (
bool(default:False)) – If True, overwrites theoutput_table_nameif it already exists insdata.**kwargs – Additional keyword arguments passed to the KMeans algorithm (
KMeans).
- Returns:
: The input
sdatawith the clustering results added.
Notes
The function adds a table element, adding clustering labels, and optionally UMAP coordinates and gene rankings, facilitating downstream analyses and visualization.
Gene ranking based on cluster contributions is intended for identifying marker genes that characterize each cluster.
Warning
The function is intended for use with spatial omics data. Input data should be appropriately preprocessed (e.g. via
preprocess_transcriptomics()orpreprocess_proteomics()) to ensure meaningful clustering results.The
rank_genesfunctionality is marked for relocation to enhance modularity and clarity of the codebase.
See also
harpy.tb.preprocess_transcriptomicspreprocess transcriptomics data.
harpy.tb.preprocess_proteomicspreprocess proteomics data.