Build a shared nearest-neighbor graph with cells as nodes.
In a shared nearest neighbor graph, pairs of cells are connected to each other by an edge with weight determined from their shared nearest neighbors. If two cells are close together but have distinct sets of neighbors, the corresponding edge is downweighted as the two cells are unlikely to be part of the same neighborhood. In this manner, highly weighted edges will form within highly interconnected neighborhoods where many cells share the same neighbors. This provides a more sophisticated definition of similarity between cells compared to a simpler (unweighted) nearest neighbor graph that just focuses on immediate proximity.
A key parameter in the construction of the graph is the number of nearest neighbors $k$ to consider. Larger values increase the connectivity of the graph and reduce the granularity of any subsequent community detection steps (see scran::ClusterSnnGraph
) at the cost of speed. The nearest neighbor search can be performed using either vantage point trees (exact) or with the Annoy algorithm (approximate) - see the knncolle library for details.
For the edges, a variety of weighting schemes are possible:
RANKED
defines the weight between two nodes as $k - r/2$ where $r$ is the smallest sum of ranks for any shared neighboring node (Xu and Su, 2015). For the purposes of this ranking, each node has a rank of zero in its own nearest-neighbor set. More shared neighbors, or shared neighbors that are close to both observations, will generally yield larger weights.
NUMBER
defines the weight between two nodes as the number of shared nearest neighbors between them. The weight can range from zero to $k + 1$, as the node itself is included in its own nearest-neighbor set. This is a simpler scheme that is also slightly faster but does not account for the ranking of neighbors within each set.
JACCARD
defines the weight between two nodes as the Jaccard index of their neighbor sets. This weight can range from zero to 1, and is a monotonic transformation of the weight used by NUMBER
.
See the ClusterSNNGraph
class to perform community detection on the graph returned by run()
.
- See also
- Xu C and Su Z (2015). Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31, 1974-80