Privacy Protection in Data Collection via Randomized Response Procedures Open Access
Downloadable ContentDownload PDF
Randomized response (RR) methods have long been suggested for protecting respondents' privacy in statistical surveys. However, how to set and achieve privacy protection goals have received little attention. We give a full development and analysis of the view that a privacy mechanism should ensure that no intruder would gain much new information about any respondent from his response. Formally, we say that a privacy breach occurs when an intruder's prior and posterior probabilities about a property of a respondent, denoted $p$ and $p_*$, respectively, satisfy $p_*h_u(p)$, where $h_l$ and $h_u$ are two given functions. An RR procedure protects privacy if it does not permit any privacy breach. We explore effects of $(h_l, h_u)$ on the resultant privacy demand, and prove that it is precisely attainable only for certain $(h_l, h_u)$. This result is used to define a canonical strict privacy protection criterion, and give practical guidance on the choice of $(h_l, h_u)$. Then, we characterize all privacy satisfying RR procedures and compare their effects on data utility using sufficiency of experiments and identify the class of all admissible procedures. For linear unbiased estimation, we derive privacy preserving minimax procedures. We address optimal choices for both the RR mechanism (or design) and the estimator. A minimax design is a $t$-subset design (with a special structure) and it can be implemented fairly easily. We also study mixtures of $t$-subset designs mainly to examine the RAPPOR method, which is used notably by Google and Apple. We note inadmissibility of the RAPPOR design and offer some suggestions for improving both the design and the customary estimator.