详细信息
文献类型:期刊文献
中文题名:基于直方图的隐私键-值数据收集算法
英文题名:Towards Private Key-Value Data Collection with Histogram
作者:张啸剑[1];徐雅鑫[1];付楠[1];孟小峰[2]
第一作者:张啸剑
机构:[1]河南财经政法大学计算机与信息工程学院,郑州450002;[2]中国人民大学信息学院,北京100872
第一机构:河南财经政法大学计算机与信息工程学院
年份:2021
卷号:58
期号:3
起止页码:624-637
中文期刊名:计算机研究与发展
外文期刊名:Journal of Computer Research and Development
收录:CSTPCD;;EI(收录号:20211310130480);Scopus;北大核心:【北大核心2020】;CSCD:【CSCD2021_2022】;
基金:国家自然科学基金项目(61502146,91646203,91746115,62072156);河南省自然科学基金项目(162300410006);河南省科技攻关项目(202102310563);河南财经政法大学青年拔尖人才资助计划项目。
语种:中文
中文关键词:本地差分隐私;随机应答机制;键值数据;频率估计;均值估计
外文关键词:local differential privacy;randomized response mechanism;key-value data;frequency estimation;mean estimation
摘要:基于本地差分隐私的用户数据收集与分析算法已延伸到了键值数据类型.然而,该类数据值域大小与稀疏性以及本地扰动机制直接制约着收集与分析精度.针对现有机制难以有效应对该类数据收集的不足,提出了一种基于直方图技术的有效收集与分析算法HISKV(histogram-based key-value data collection),该算法首先结合用户分组策略寻找最优截断长度,利用最优截断抽样技术处理值域过大与稀疏性问题,然后结合截断结果随机抽取单个键值对进行离散化处理.针对离散化结果,设计一种高效的本地扰动机制LRR_KV(local random response for key-value data),该机制结合具体的键分配不同的本地扰动概率.每个用户利用LRR_KV机制扰动离散化的键值对之后发送给收集者,收集者结合用户的报告值对每个键的频率及其值所对应的均值进行估计.理论分析了HISKV算法的无偏性、所产生的方差以及最大偏差,并与现有的键值收集算法在真实与合成的数据集上进行比较,实验结果表明HISKV算法优于同类算法.
Recently,user data collection and analysis with local differential privacy has extended into key-value data.The trade-off between the size and sparsity of domain and perturbation method directly constrains the accuracy of the collection and analysis of such data.To remedy the deficiency caused by the domain size and perturbating method,this paper employs histogram technology to propose an efficient solution,called HISKV,to collect key-value data.HISKV firstly uses a user-grouping strategy and partial privacy budget to find the optimal length of truncation and enables each user to truncate his/her key-value data set.And then,based on the truncated set,each user samples one key-value pair and uses the discretization and perturbation method to process this pair.To perturb key-value data efficiently,a novel mechanism in HISKV,named LRR_KV is proposed,which allocates different perturbing probability for different keys.In LRR_KV,each user adopts this mechanism to add noise to his/her sampled pair,and sents the report to a collector.Based on the reports from all of the users,the collector estimates the frequency of each key and the mean of the values.To evaluate the utility of HISKV,we firstly conduct theoretical analysis on unbias,variance,and error bound of LRR_KV,and then perform experiments on real and synthetic datasets to compare different methods.The experimental results show that HISKV outperforms its competitors.
参考文献:
正在载入数据...