We consider the task of protecting respondent's privacy when collecting data on categorical variables. Any mechanism for masking the true value of a respondent can be viewed as a randomized response (RR) procedure, and its prudent planning depends crucially on the given privacy criterion. We examine some existing privacy criteria and describe their drawbacks. We show that a previous notion of average security is inappropriate. Several other criteria, which simply impose upper bounds on the parity of the RR design, inflict severe data utility loss, unless the number of categories is fairly small. This applies to local differential privacy (LDP), which is a leading privacy criterion, and reveals substantial statistical inefficiency of the RAPPOR procedure, which has been in use by Google, Apple and others. We propose a new privacy procedure that is similar to l-diversity but, works locally for each respondent. The procedure is simple to implement and its privacy protection is easy to understand and communicate to survey participants. We give an unbiased estimator of the probability vector of all categories and prove its minimaxity within a class of estimators under squared error loss. We argue and believe that the new procedure offers a better privacy-utility trade-off than LDP.

You May Be Interested In

Page Last Revised - October 28, 2021

Some content on this site is available in several different electronic formats. Some of the files may require a plug-in or additional software to view.

Is this page helpful?
Thumbs Up Image

Yes

NO THANKS

255 characters maximum

255 characters maximum reached

Thank you for your feedback.
Comments or suggestions?

Top