Combinatorial optimization algorithms for clustering and machine learning.
The dominant algorithms for machine learning tasks fall most often in the realm of AI or continuous optimization of intractable problems. We present combinatorial algorithms for machine learning, data mining, and image segmentation that, unlike the majority of existing machine learning methods, utilize pairwise similarities. These algorithms are efficient and reduce the classification problem to a network flow problem on a graph. One of these algorithms addresses the problem of finding a cluster that is as dissimilar as possible from the complement, while having high similarity within the cluster, is called HNC (Hochbaum’s Normalized Cut). The two objectives are combined either as a ratio or with linear weights. HNC is a variant of normalized cut, which is intractable, and is solved, heuristically, with the spectral technique. An extensive empirical study demonstrates that incorporating the use of pairwise similarities improves accuracy of classification and clustering. Another experimental study that compared the performance of HNC to that of the spectral technique, for imaging instances, demonstrates that in practice HNC’s performance dominates the performance of the spectral technique. To address the quadratic rate of growth in the size of the data with the use of similarities we employ a method called “sparse computation”. It is shown that using “sparse computation” enables the scalability of similarity-based algorithms to very large-scale data sets while maintaining high levels of accuracy. We demonstrate several applications of variants of HNC for data mining, medical imaging, and image segmentation tasks, including the recent use of HNC for cell identification in neuroscience datasets.
Dorit S. Hochbaum is a full professor at UC Berkeley’s department of Industrial Engineering and Operations Research (IEOR). Her research interests are in the areas of design and analysis of computer algorithms, approximation algorithms and discrete and continuous optimization. Her recent work focuses on efficient techniques related to network flows with novel applications in ranking, pattern recognition, data mining and image segmentation problems.
Professor Hochbaum is the author of over 160 papers that appeared in the Operations Research, Management Science and Theoretical Computer Science literature. Prof. Dorit S. Hochbaum was awarded an honorary doctorate of sciences by the University of Copenhagen recognizing Hochbaum’s ground-breaking achievements and leadership in optimization in general and in the field of approximation algorithms for intractable problems in particular. Hochbaum was awarded the title of INFORMS fellow for her extensive contributions to Operations Research, Management Science and design of algorithms. She is the winner of the 2011 INFORMS Computing Society prize for best paper dealing with the Operations Research/Computer Science interface. Professor Hochbaum is a fellow of the Society of Industrial and Applied Mathematics (SIAM) since 2014.