登录    注册    忘记密码

详细信息

基于本地差分隐私的键-值数据精确收集方法  ( EI收录)  

Key-Value Data Accurate Collection under Local Differential Privacy

文献类型:期刊文献

中文题名:基于本地差分隐私的键-值数据精确收集方法

英文题名:Key-Value Data Accurate Collection under Local Differential Privacy

作者:张啸剑[1];付楠[1];孟小峰[2]

第一作者:张啸剑

机构:[1]河南财经政法大学计算机与信息工程学院,郑州450046;[2]中国人民大学信息学院,北京100872

第一机构:河南财经政法大学计算机与信息工程学院

年份:2020

卷号:43

期号:8

起止页码:1479-1492

中文期刊名:计算机学报

外文期刊名:Chinese Journal of Computers

收录:CSTPCD;;EI(收录号:20204209348170);Scopus;北大核心:【北大核心2017】;CSCD:【CSCD2019_2020】;

基金:国家自然科学基金项目(61502146,91746115,91646203,61772131);河南省自然科学基金资助项目(162300410006);河南省高等学校青年骨干教师培养计划项目(2017GGJS084);河南省教育厅高等学校重点科研项目(16A520002);河南财经政法大学青年拔尖人才资助计划项目资助.

语种:中文

中文关键词:键-值数据;隐私保护;本地差分隐私;频率估计;均值估计

外文关键词:key-value data;privacy protection;local differential privacy;frequency estimation;mean estimation

摘要:基于本地差分隐私的键-值数据的收集与分析得到了研究者的广泛关注.键与值的值域大小、二者之间的关联性、报告给收集者的通信方式以及本地扰动机制直接制约着频率与均值估计的精度.针对现有键-值数据本地扰动方法存在的不足,该文提出了一种精确且有效的本地扰动方法LDPKV(Locally Differentially Private Key-Value data collection),该方法结合输入值域与输出值域之间的整体映射关系,在较少通信代价与不分割隐私预算的情况下,对键-值对进行统一处理.其主要思想是:首先对每个用户所拥有的键-值对进行统一离散化处理;结合每个用户的离散化结果,利用伯努利采样技术随机地抽样一条键-值对进行本地随机扰动;然后将扰动后的键-值对报告给收集者.收集者利用每个用户的报告值估计每个键的频率以及所对应值的均值.理论分析了LDPKV方法产生的方差与最大偏差,以及与现有键-值数据收集方法在真实与合成数据集上进行综合比较.实验结果表明LDPKV方法均优于同类方法.
Privacy-preserving data collection is an important problem that has attracted much research attention.The state-of-the-art solution for this problem is local differential privacy,which provides a strong protection by adding random noise from users.Existing methods with local differential privacy,however,rarely focus on key-value data collection.In this paper,given a set U of users,we study locally differentially private algorithms for collecting key-value data over U to estimate the frequency of each key and mean of the value.Existing methods for key-value data collection mostly adopt direct encoding mechanisms and local perturbation mechanisms,which locally encode and perturb the value of each user.The two mechanisms,however,require that we have to transform the whole domain of key-value data and split the privacy budget.The transformation of the whole domain makes the communication cost too high.Furthermore,the partition of the budget leads to excessive noise and lower utility.To remedy the deficiency of existing methods,this paper proposes an accurate and efficient method with local differential privacy,called LDPKV(Locally Differentially Private Key-Value data collection)to collect and analyze key-value data.LDPKV firstly discretes the key-value data of each user in terms of the mapping between input and output domain.Based on the discretization,each user in LDPKV randomly samples a key-value pair by using Bernoulli sampling technology,and runs the local randomized response on the sample and reports the index of sample along with the perturbed value to the collector.After that,the collector in LDPKV accumulates all of the reports from users to estimate the frequency of each key and mean of values.Finally,we conduct theoretical analysis on variance and error bound of LDPKV.Based on two metrics:MSE(Mean Square Error)and RE(Relative Error),three groups of experiments were conducted on real and synthetic datasets(TalkingData,JData,GAUSS,PLAW,and LNR)to evaluate the accuracy of the frequency and the mean generated by from LDPKV,Rappor,KRR,PrivKV,PrivKVM,KVOH,and Harmony.For example,based on JData dataset,we fixε=1.6 to compare the accuracy of the above seven methods.It can be seen from Figure 2(a)that the MSE of LDPKV is 0.002,and 1/6 times that of Rappor and KRR,and 1/2 times that of PrivKV,PrivKVM,and KVOH,respectively.From Figure 2(b),whenεvaries from 0.1 to 3.2,LDPKV achieves a small value of MSE on mean evaluation against Rappor,Harmony,PrivKV,PrivKVM,and KVOH.Besides,based on three synthetic datasets,LDPKV still achieves better accuracy than Rappor,KRR,PrivKV,PrivKVM,KVOH,and Harmony.For instance,from Figure 3 to Figure 5,when varyingεfrom 0.1 to 0.8 and varying top k from top10 to top50,we can see that the RE of six methods decreases gradually.Especially,on fixingε=0.2 and top k=top10,the RE of LDPKV is 1/4 times,1/6 times that of Rappor and KRR on LNR and PLAW,respectively.From the results of the above experimental analysis,we can see that LDPKV is better than its competitors.

参考文献:

正在载入数据...

版权所有©河南财经政法大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心