Mean Estimation with User-level Privacy under Data Heterogeneity

Date: March 3, 2022

Time: 4:00 pm

Room: DBH 4011

Speaker: Rachel Cummings

(Columbia)

Additional Notes:

(joint work with Vitaly Feldman, Audra McMillan, and Kunal Talwar)

Abstract:

A key challenge in many modern data analysis tasks is that user data is heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that differs in both distribution and quantity of data, and we provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in our problem. In particular, while the optimal non-private estimator can be shown to be linear, we show that privacy constrains us to use a non-linear estimator.

Bio:

Dr. Rachel Cummings is an Assistant Professor in the Departments of Industrial Engineering and Operations Research and (by courtesy) Computer Science at Columbia University. Before joining Columbia, she was an Assistant Professor of Industrial and Systems Engineering and (by courtesy) Computer Science at the Georgia Institute of Technology. Her research interests lie primarily in data privacy, with connections to machine learning, algorithmic economics, optimization, statistics, and public policy. Her work has focused on problems such as strategic aspects of data generation, incentivizing truthful reporting of data, privacy-preserving algorithm design, impacts of privacy policy, and human decision-making. Dr. Cummings received her Ph.D. in Computing and Mathematical Sciences from the California Institute of Technology, her M.S. in Computer Science from Northwestern University, and her B.A. in Mathematics and Economics from the University of Southern California. She is the recipient of an NSF CAREER award, a DARPA Young Faculty Award, an Apple Privacy-Preserving Machine Learning Award, JP Morgan Chase Faculty Award, a Google Research Fellowship for the Simons Institute program on Data Privacy, a Mozilla Research Grant, the ACM SIGecom Doctoral Dissertation Honorable Mention, the Amori Doctoral Prize in Computing and Mathematical Sciences, a Caltech Leadership Award, a Simons Award for Graduate Students in Theoretical Computer Science, and the Best Paper Award at the 2014 International Symposium on Distributed Computing. Dr. Cummings also serves on the ACM U.S. Public Policy Council’s Privacy Committee and the Future of Privacy Forum’s Advisory Board.

ACO Center @ UCI

Mean Estimation with User-level Privacy under Data Heterogeneity