harpy.tb.nhood_kmeans

Contents

harpy.tb.nhood_kmeans#

harpy.tb.nhood_kmeans(sdata, table_name, output_table_name, cluster_key='annotation', labels_name=None, connectivity_key='spatial_connectivities', composition_key='nhood_composition', key_added='nhood_kmeans', n_clusters=5, random_state=100, nan_label=-1, overwrite=False, **kwargs)#

Cluster cells (instances) based on neighborhood cell-type composition using KMeans.

This function expects a precomputed spatial connectivity matrix in sdata.tables[table_name].obsp[connectivity_key] and does not calculate neighbors itself. For example, the graph can be computed beforehand with squidpy.gr.spatial_neighbors() and stored in sdata.tables[table_name].obsp[connectivity_key]. Neighborhood cell-type fractions are then computed from that graph, stored in adata.obsm[composition_key], and used as the feature matrix for KMeans. The resulting niche assignments are written to adata.obs[key_added].

Parameters:
  • sdata (SpatialData) – The input SpatialData object.

  • table_name (str) – The table element in sdata on which to perform niche clustering.

  • output_table_name (str) – The output table element in sdata to which the updated table element will be written.

  • cluster_key (str (default: 'annotation')) – Key in adata.obs containing the cluster annotations used to build the neighborhood composition.

  • labels_name (str | Iterable[str] | None (default: None)) – Optional labels element or elements used to subset the table before clustering. If provided, only observations linked to these labels elements are considered.

  • connectivity_key (str (default: 'spatial_connectivities')) – Key pointing to the cell-cell connectivity matrix in adata.obsp, with shape (n_cells, n_cells). If the exact key is not found, {connectivity_key}_connectivities is tried as a convenience for graphs created with squidpy.gr.spatial_neighbors() using key_added=....

  • composition_key (str (default: 'nhood_composition')) – Key used to store the computed neighborhood composition. The dense neighborhood-fraction matrix is written to adata.obsm[composition_key] with shape (n_cells, n_categories), where each row contains, for one cell, the fraction of neighbors that belong to each category in cluster_key. Related metadata is stored under adata.uns[composition_key], including the cluster key that was used, the resolved connectivity key from adata.obsp, and the ordered cluster labels corresponding to the columns of adata.obsm[composition_key]. Using the same composition_key in both places keeps the feature matrix and its column definitions linked and makes it easier to reuse the computed neighborhood features in downstream analyses. For example, adata.obsm[composition_key] could look like [[0.75, 0.25, 0.00], [0.00, 0.50, 0.50]], meaning that the first cell has neighbors composed of 75% of the first cell type and 25% of the second, while adata.uns[composition_key]["cluster_categories"] stores the ordered labels for those columns.

  • key_added (str (default: 'nhood_kmeans')) – Key in adata.obs where the resulting niche labels are written.

  • n_clusters (int (default: 5)) – Number of KMeans clusters to compute.

  • random_state (int (default: 100)) – Random state used for reproducible clustering.

  • nan_label (int | str | None (default: -1)) – Label assigned to isolated cells with zero graph degree.

  • overwrite (bool (default: False)) – If True, overwrite output_table_name if it already exists in sdata.

  • **kwargs (Any) – Additional keyword arguments passed to KMeans.

Return type:

SpatialData

Returns:

: The updated SpatialData object.