Leakage and Protection of Dataset Properties
Data privacy in computer science has been mostly concerned with protecting individual’s data when releasing a result of a computation on a larger dataset (e.g., differential privacy). In this talk, I will depart from individual privacy and consider confidentiality of dataset properties (e.g., race or gender distribution in a dataset). First, I will show that global properties about dataset attributes can be leaked when one releases machine learning models computed on this data or contributes their data towards collaborative learning. Then, I will discuss definitions for protecting dataset properties and describe mechanisms that can meet these definitions.
This talk is based on joint work with Michelle Chen (The University of Melbourne), Rachel Cummings (Columbia University), Shruti Tople (Microsoft Research) and Wanrong Zhang (Harvard University), appeared at USENIX Security 2021 and ACM Conference on Fairness, Accountability, and Transparency 2022.
Olya Ohrimenko is an Associate Professor at The University of Melbourne which she joined in 2020. Prior to that she was a Principal Researcher at Microsoft Research in Cambridge, UK, where she started as a Postdoctoral Researcher in 2014. Her research interests include privacy and integrity of machine learning algorithms, data analysis tools and cloud computing, including topics such as differential privacy, verifiable and data-oblivious computation, trusted execution environments, side-channel attacks and mitigations. Recently Olya has worked with the Australian Bureau of Statistics and National Bank Australia. She has received solo and joint research grants from Facebook and Oracle and is currently a PI on an AUSMURI grant.