Abstract: Statistical estimation requires data. In many settings, data are possessed by people and need to be acquired first. We consider a data analyst’s problem of purchasing data from strategic agents to compute some statistic of interest. Agents incur private costs to collect and reveal their data and the costs can be arbitrarily correlated with their data. Once revealed, data are verifiable. In this talk, I focus on unbiased point estimators (e.g mean) and confidence intervals. The goal is to design a joint acquisition-estimation mechanism to optimize the performance of the produced estimator. We design individually rational and incentive compatible mechanisms that optimize worst-case performance of the estimators under two scenarios, when the data analyst knows the marginal cost distribution of agents and when the data analyst has no prior information on the underlying distribution of cost and data.
This talk is based on two papers, one joint with Nicole Immorlica, Brendan Lucier, Vasilis Syrgkanis, and Juba Ziani and the other joint with Shuran (Sherry) Zheng.