登录    注册    忘记密码

详细信息

混洗差分隐私下的多维类别数据的收集与分析  ( EI收录)  

Collecting and Analyzing Multidimensional Categorical Data Under Shuffled Differential Privacy

文献类型:期刊文献

中文题名:混洗差分隐私下的多维类别数据的收集与分析

英文题名:Collecting and Analyzing Multidimensional Categorical Data Under Shuffled Differential Privacy

作者:刘艺菲[1];王宁[1];王志刚[1];谷峪[3];魏志强[1];张啸剑[2];于戈[3]

第一作者:刘艺菲

机构:[1]中国海洋大学信息科学与工程学部,山东青岛266100;[2]河南财经政法大学计算机与信息工程学院,河南郑州450046;[3]东北大学计算机科学与工程学院,辽宁沈阳110819

第一机构:中国海洋大学信息科学与工程学部,山东青岛266100

年份:2022

卷号:33

期号:3

起止页码:1093-1110

中文期刊名:软件学报

外文期刊名:Journal of Software

收录:CSTPCD;;EI(收录号:20221311852482);Scopus;北大核心:【北大核心2020】;CSCD:【CSCD2021_2022】;

基金:国家自然科学基金(61902365,61902366,62072156);中央高校基本科研业务费(202042008);中国博士后基金(2019M652473,2019M652474,2020T130623);青岛市自主创新重点研发(20-3-2-12-xx);青岛市博士后应用项目。

语种:中文

中文关键词:混洗差分隐私;隐私保护;多维类别数据;频率估计

外文关键词:shuffled differential privacy;privacy protection;multidimensional categorical data;frequency estimation

摘要:随着大数据时代的到来,如何在保护用户隐私的前提下完成多维类别数据上的频率分布估计问题成为研究热点.已有的工作主要是基于中心化差分隐私模型或本地化差分隐私模型完成安全算法的设计.鉴于上述两种模型在隐私保护程度或发布结果可用性方面的弊端,基于新兴的混洗差分隐私模型,设计用户数据收集策略,进而提供高安全、高可用的频率分布估计服务.考虑到多维类别属性的多维特征以及不同属性上取值域大小不等的异构特点,从扰动算法以及洗牌方式等角度出发,设计了基于单洗牌者以及多洗牌者的数据发布方案ARR-SS和SRR-MS.此外,结合上述两种方案的优势,通过填补技术消除属性间异构问题,提出了基于取值域填补的单洗牌者数据发布方案PSRR-SS.从理论上分析了3种策略的隐私保护程度以及误差级别,并利用4个真实数据集验证所提出方案在频率估计问题上的有效性.此外,将所提方案作为带噪数据库生成技术的加噪组件,评估随机梯度下降算法在生成带噪数据上的训练结果的可用性.实验结果展现了所提方案优于当前同类算法.
The big era is coming with the ever-growing demands on frequency estimation based on sensitive multi-dimensional categorical data.The existing works are devoted to designing privacy protection algorithms based on centralized differential privacy or local differential privacy.However,the above models provide either the weak level of privacy protection or low accuracy of published results.Therefore,standing on the emerging shuffled differential privacy which remedies the above modes,the data collection mechanisms are designed,providing frequency distribution estimation service with high security and high availability.Considering the multi-dimensional characteristics of data and the heterogeneous characteristics existed in different attributes,the mechanisms including SRR-MS with multiple shufflers and ARR-SS with one shuffler are firstly proposed.And then in order to combine the advantages of the above two mechanisms,PSRR-SS with one single shuffler,is proposed to eliminate the heterogeneity among attributes by means of padding dummy values technology to the attribute domains.This study detailedly analyzes the degree of privacy protection and the error level of three strategies theoretically,and evaluates the performance of the proposed mechanisms on frequency estimation by using four real datasets.Besides,the proposals are used as the perturbing component of the techniques generating synthetic data and the training results of stochastic gradient descent are evaluated based on synthetic data.The experimental results show that the proposed method outperforms the existing algorithms.

参考文献:

正在载入数据...

版权所有©河南财经政法大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心