登录    注册    忘记密码

详细信息

差分隐私保护下一种精确挖掘top-k频繁模式方法  ( EI收录)  

An Accurate Method for Mining top-k Frequent Pattern Under Differential Privacy

文献类型:期刊文献

中文题名:差分隐私保护下一种精确挖掘top-k频繁模式方法

英文题名:An Accurate Method for Mining top-k Frequent Pattern Under Differential Privacy

作者:张啸剑[1,2];王淼[1];孟小峰[1]

第一作者:张啸剑

通讯作者:Zhang, X.

机构:[1]中国人民大学信息学院;[2]河南财经政法大学计算机与信息工程学院

第一机构:中国人民大学信息学院,北京100872

通讯机构:[1]School of Information, Renmin University of China, Beijing 100872, China

年份:2014

卷号:51

期号:1

起止页码:104-114

中文期刊名:计算机研究与发展

外文期刊名:Journal of Computer Research and Development

收录:CSTPCD;;EI(收录号:20141217481938);Scopus(收录号:2-s2.0-84896048369);北大核心:【北大核心2011】;CSCD:【CSCD2013_2014】;

基金:国家自然科学基金项目(61379050;91024032;91224008;91124001;91324015);国家"八六三"高技术研究发展计划基金项目(2012AA011001;2013AA013204);高等学校博士学科点专项科研基金项目(20130004130001);中国人民大学科学研究基金项目(11XNL010)

语种:中文

中文关键词:频繁模式挖掘;top-k模式;差分隐私;拉普拉斯机制;指数机制

外文关键词:frequent pattern mining; top-k pattern; differential privacy; Laplace mechanism; exponentialmechanism

摘要:频繁模式挖掘是分析事务数据集常用技术.然而,当事务数据集含有敏感数据时(如用户行为记录、电子病例等),直接发布频繁模式及其支持度计数会给个人隐私带来相当大的风险.对此提出了一种满足ε-差分隐私的top-k频繁模式挖掘算法DP-topkP(differentially private top-kpattern mining).该算法利用指数机制从候选频繁模式集合中挑选出top-k个携带真实支持度计数的模式;采用拉普拉斯机制产生的噪音扰动所选模式的真实支持度计数;为了增强输出模式的可用性,采用后置处理技术对top-k个模式的噪音支持度计数进行求精处理.从理论角度证明了该算法满足ε-差分隐私,并符合(λ,δ)-useful要求.实验结果证明了DP-topkP算法具有较好的准确性、可用性和可扩展性.
Frequent pattern mining is a popular technique for analyzing transaction datasets. However, because these datasets contain sensitive information (e. g. , user behavior records, electronic health records, etc), directly releasing discovered frequent patterns with true support counts will carry significant risk to privacy of individuals. In this paper, we propose an efficient algorithm, called DP topkP, based on differential privacy model, to accurately find top-k frequent patterns. To avoid the individuals' privacy leakage, in this algorithm, exponential mechanism is used to sample top-k frequent patterns in a candidate set, and Laplace mechanism is utilized to generate the noisy data for perturbing the true support counts of the sampled patterns. However, the noisy support counts returned may be inconsistent with query semantic constraints (e. g. , descending order, integer, etc), which will make the utility of the discovered top-k patterns poor. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct the post processing step. We theoretically prove that the proposed method is (γ,δ)-useful and differentially private. The experimental results demonstrate that this method can maintain better accuracy, utility and scalability.

参考文献:

正在载入数据...

版权所有©河南财经政法大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心