harpy.tb.nhood_kmeans#
- harpy.tb.nhood_kmeans(sdata, table_name, output_table_name, cluster_key='annotation', labels_name=None, connectivity_key='spatial_connectivities', composition_key='nhood_composition', key_added='nhood_kmeans', n_clusters=5, random_state=100, nan_label=-1, overwrite=False, **kwargs)#
Cluster cells (instances) based on neighborhood cell-type composition using KMeans.
This function expects a precomputed spatial connectivity matrix in
sdata.tables[table_name].obsp[connectivity_key]and does not calculate neighbors itself. For example, the graph can be computed beforehand withsquidpy.gr.spatial_neighbors()and stored insdata.tables[table_name].obsp[connectivity_key]. Neighborhood cell-type fractions are then computed from that graph, stored inadata.obsm[composition_key], and used as the feature matrix forKMeans. The resulting niche assignments are written toadata.obs[key_added].- Parameters:
sdata (
SpatialData) – The input SpatialData object.table_name (
str) – The table element insdataon which to perform niche clustering.output_table_name (
str) – The output table element insdatato which the updated table element will be written.cluster_key (
str(default:'annotation')) – Key inadata.obscontaining the cluster annotations used to build the neighborhood composition.labels_name (
str|Iterable[str] |None(default:None)) – Optional labels element or elements used to subset the table before clustering. If provided, only observations linked to these labels elements are considered.connectivity_key (
str(default:'spatial_connectivities')) – Key pointing to the cell-cell connectivity matrix inadata.obsp, with shape(n_cells, n_cells). If the exact key is not found,{connectivity_key}_connectivitiesis tried as a convenience for graphs created withsquidpy.gr.spatial_neighbors()usingkey_added=....composition_key (
str(default:'nhood_composition')) – Key used to store the computed neighborhood composition. The dense neighborhood-fraction matrix is written toadata.obsm[composition_key]with shape(n_cells, n_categories), where each row contains, for one cell, the fraction of neighbors that belong to each category incluster_key. Related metadata is stored underadata.uns[composition_key], including the cluster key that was used, the resolved connectivity key fromadata.obsp, and the ordered cluster labels corresponding to the columns ofadata.obsm[composition_key]. Using the samecomposition_keyin both places keeps the feature matrix and its column definitions linked and makes it easier to reuse the computed neighborhood features in downstream analyses. For example,adata.obsm[composition_key]could look like[[0.75, 0.25, 0.00], [0.00, 0.50, 0.50]], meaning that the first cell has neighbors composed of 75% of the first cell type and 25% of the second, whileadata.uns[composition_key]["cluster_categories"]stores the ordered labels for those columns.key_added (
str(default:'nhood_kmeans')) – Key inadata.obswhere the resulting niche labels are written.n_clusters (
int(default:5)) – Number of KMeans clusters to compute.random_state (
int(default:100)) – Random state used for reproducible clustering.nan_label (
int|str|None(default:-1)) – Label assigned to isolated cells with zero graph degree.overwrite (
bool(default:False)) – IfTrue, overwriteoutput_table_nameif it already exists insdata.**kwargs (
Any) – Additional keyword arguments passed toKMeans.
- Return type:
- Returns:
: The updated SpatialData object.