登录    注册    忘记密码

详细信息

一种基于Yarn云计算平台与NMF的大数据聚类算法    

A Yarn and NMF Based Big Data Clustering Algorithm

文献类型:期刊文献

中文题名:一种基于Yarn云计算平台与NMF的大数据聚类算法

英文题名:A Yarn and NMF Based Big Data Clustering Algorithm

作者:冯新扬[1];沈建京[2]

第一作者:冯新扬

机构:[1]河南财经政法大学计算机与信息工程学院;[2]解放军战略支援部队信息工程大学

第一机构:河南财经政法大学计算机与信息工程学院

年份:2018

卷号:0

期号:8

起止页码:43-49

中文期刊名:信息网络安全

外文期刊名:Netinfo Security

收录:CSTPCD;;北大核心:【北大核心2017】;CSCD:【CSCD2017_2018】;

基金:国家自然科学基金[61202285];河南省科技攻关项目[122102210387];河南省教育厅科技攻关项目[13B520902]

语种:中文

中文关键词:云计算;大数据;Yam平台;非负矩阵分解;聚类算法

外文关键词:cloud computing; big data; Yarn platform; non-negative matrix factorization; cluster algorithm

摘要:为了改进Map Reduce早期版本在大数据聚类算法方面的性能,文章提出了基于Yarn(Yet Another Resource Negotiator)云计算平台与非负矩阵分解NMF(Nonnegative Matrix Factorization)的大数据聚类方法。文章讨论了高维数据相似性聚类与非负矩阵分解的结合及其面向Map Reduce的数据聚类的任务划分方式。该方法的实现采用Hadoop2.0的Yarn平台,利用Hadoop的HDFS(Hadoop Distributed File System)来存储大容量的外部数据;描述了基于NMF的大数据相似性聚类方法的编码与实现过程,并以电信运营商的大数据作为案例程序进行了测试。实验结果表明,Yarn云平台比传统用于数据聚类的非负矩阵方法具有更好的运行时间与加速比,能够在可以接受的时间范围内完成电信运营商的大数据处理。
In order to improve the performance of MapReduce version 1 on big data processing, a Yarn and NMF (Non-negative Matrix Factorization) based Parallel hierarchical clustering algorithm was proposed in this paper. The combination of big data classification with NMF algorithm and the task partition in our MapReduce approach were discussed subsequently. Our approach used the Yarn distributed computation programming model of Hadoop2.0 and thus the big data was stored in HDFS (Hadoop Distributed File System). The coding mechanism and flow of hierarchical data clustering on Yarn were also discussed and described in detail. In order to demonstrate the efficiency of our approach, a serial of simulation experiments on a telecommunication big data were done. The results andperformance analysis demonstrated that big data can be completed in an accepted time scope with Yarn framework. Good performance and speedup had been also obtained in our test.

参考文献:

正在载入数据...

版权所有©河南财经政法大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心