Abstract: In the world of big data, it is often easy to get a large amount of unlabeled data. However, learning algorithms typically require labeled data to train on. Active learning is an approach to bridge this gap, where the learning algorithm may query the labels of a few data points. The hope is that by judiciously choosing which data points to query, the number of queries can be made much smaller compared to the classical setting where all the data is labeled.
Unfortunately, this turns out to be false, even for very simple concept classes. This led research in active learning to focus on making various assumptions on the underlying data, distribution, and so on.
I will present an alternative approach: allowing the learning algorithm to ask more informative questions about the data. As I will show, even allowing very simple enriched queries, such as comparing two data points, allows in many cases to obtain exponential improvement. In addition, the mathematical tools that underlie this turn out to have surprising applications to long-standing open problems in complexity theory and discrete geometry.
Based on joint works with Max Hopkins, Daniel Kane, Gaurav Mahajan, Shay Moran and Jiapeng Zhang.