Volume 3, Issue 1
Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold

Junda Sheng & Thomas Strohmer

J. Mach. Learn. , 3 (2024), pp. 64-106.

Published online: 2024-03

Export citation
  • Abstract

The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with an arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to the stochastic model of networks and semidefinite program research.

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JML-3-64, author = {Sheng , Junda and Strohmer , Thomas}, title = {Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold}, journal = {Journal of Machine Learning}, year = {2024}, volume = {3}, number = {1}, pages = {64--106}, abstract = {

The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with an arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to the stochastic model of networks and semidefinite program research.

}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.230624}, url = {http://global-sci.org/intro/article_detail/jml/22984.html} }
TY - JOUR T1 - Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold AU - Sheng , Junda AU - Strohmer , Thomas JO - Journal of Machine Learning VL - 1 SP - 64 EP - 106 PY - 2024 DA - 2024/03 SN - 3 DO - http://doi.org/10.4208/jml.230624 UR - https://global-sci.org/intro/article_detail/jml/22984.html KW - Clustering, Semi-supervised learning, Stochastic block model, Kesten-Stigum threshold, Semidefinite programming. AB -

The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below a certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with an arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to the stochastic model of networks and semidefinite program research.

Junda Sheng & Thomas Strohmer. (2024). Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold. Journal of Machine Learning. 3 (1). 64-106. doi:10.4208/jml.230624
Copy to clipboard
The citation has been copied to your clipboard