登录    注册    忘记密码

详细信息

差分隐私下的一种频繁序列模式挖掘方法  ( EI收录)  

Frequent Sequential Pattern Mining under Differential Privacy

文献类型:期刊文献

中文题名:差分隐私下的一种频繁序列模式挖掘方法

英文题名:Frequent Sequential Pattern Mining under Differential Privacy

作者:卢国庆[1,2];张啸剑[3];丁丽萍[1];李彦峰[1];廖鑫[1,4]

第一作者:卢国庆

通讯作者:Ding, Liping

机构:[1]中国科学院软件研究所;[2]中国科学院大学;[3]河南财经政法大学计算机与信息工程学院;[4]湖南大学信息科学与工程学院

第一机构:中国科学院软件研究所,北京100190

年份:2015

卷号:52

期号:12

起止页码:2789-2801

中文期刊名:计算机研究与发展

外文期刊名:Journal of Computer Research and Development

收录:CSTPCD;;EI(收录号:20160501878295);Scopus(收录号:2-s2.0-84961333853);北大核心:【北大核心2014】;CSCD:【CSCD2015_2016】;

基金:国家科技重大专项基金项目(2012ZX01039-004);中国科学院战略性先导科技专项基金项目(XDA06010600);中国博士后科学基金一等资助项目(2014M560123);国家自然科学基金项目(61202285)

语种:中文

中文关键词:频繁序列模式;数据挖掘;差分隐私;隐私保护;前缀树

外文关键词:frequent sequential pattern; data mining; differential privacy(DP); privacy protection; prefix tree

摘要:频繁序列模式挖掘是数据挖掘领域的1个基本问题,然而模式本身及其支持度计数都有可能泄露用户隐私信息.差分隐私(differential privacy,DP)作为一种新出现的隐私保护技术,定义了一个相当严格的攻击模型,通过添加噪音使数据失真达到隐私保护的目的.由于序列数据内在序列性和高维度的特点,给差分隐私应用于频繁序列模式挖掘带来了挑战.对此提出了一种基于交互式差分隐私保护框架的频繁序列模式挖掘算法Diff-FSPM(differential-privacy frequent sequential pattern mining).该算法利用指数机制获取最优序列长度,并采用一种维规约策略获得原始序列数据集的规约表示,有效降低序列维度的影响;应用前缀树压缩频繁序列模式,利用拉普拉斯机制产生的噪音扰动频繁模式的真实支持度计数,同时采用闭频繁序列模式和Markov假设,有效分配隐私预算,并利用一致性约束后置处理,增强输出模式的可用性.理论角度证明算法满足ε-差分隐私,实验结果验证算法具有较好的可用性.
Frequent sequential pattern mining is an exploratory problem in the field of data mining.However,directly releasing the discovered frequent patterns and the corresponding true supports may reveal the individuals' privacy.The state-of-the-art solution for this problem is differential privacy,which offers a strong degree of privacy protection by adding noise.Due to the inherent sequentiality and high-dimensionality in sequences,it is challenging to apply differential privacy to frequent sequential pattern mining.To address those problems,this paper presents a differentially private method called Diff-FSPM that uses an interactive way to find frequent patterns.To reduce the impact of dimensionality,Diff-FSPM first employs the exponential mechanism to obtain the optimal sequential length for truncating each sequence in the original dataset.After that,Diff-FSPM relies on aprefix-tree to compress all the frequent patterns,and then utilizes the Laplace mechanism to perturb the true support of the frequent patterns.To efficiently allocate the privacy budget,Diff-FSPM uses the closet frequent pattern and Markov assumption to guide the mining process.Finally,Diff-FSPM adopts a post-processing technique with consistency constraint to boost the accuracy of the returned noisy support counts.We theoretically prove that Diff-FSPM satisfiesε-differential privacy,and the experimental results show that it outperforms the existing methods in terms of utility.

参考文献:

正在载入数据...

版权所有©河南财经政法大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心