登录    注册    忘记密码

详细信息

A new solution to distributed permutation flow shop scheduling problem based on NASH Q-Learning  ( SCI-EXPANDED收录)  

文献类型:期刊文献

英文题名:A new solution to distributed permutation flow shop scheduling problem based on NASH Q-Learning

作者:Ren, J. F.[1,2];Ye, C. M.[1];Li, Y.[1]

第一作者:Ren, J. F.;任剑锋

通讯作者:Ye, CM[1]

机构:[1]Univ Shanghai Sci & Technol, Sch Business, Shanghai, Peoples R China;[2]Henan Univ Econ & Law, Sch Comp & Informat Engn, Zhengzhou, Peoples R China

第一机构:Univ Shanghai Sci & Technol, Sch Business, Shanghai, Peoples R China

通讯机构:[1]corresponding author), Univ Shanghai Sci & Technol, Sch Business, Shanghai, Peoples R China.

年份:2021

卷号:16

期号:3

起止页码:269-284

外文期刊名:ADVANCES IN PRODUCTION ENGINEERING & MANAGEMENT

收录:;Scopus(收录号:2-s2.0-85120727658);WOS:【SCI-EXPANDED(收录号:WOS:000723602100001)】;

基金:Project supported by the Key Soft Science Project of "Science and Technology Innovation Action Plan of Shanghai Science and Technology Commission, China (No. 20692104300). National Natural Science Foundation, China (No. 71840003), and the Technology Development Project of University of Shanghai for Science and Technology, China (No. 2018KJFZ043).

语种:英文

外文关键词:Flow shop scheduling; Distributed scheduling; Permutation flow shop; Reinforcement learning; NASH Q-learning; Mean field (MF)

摘要:Aiming at Distributed Permutation Flow-shop Scheduling Problems (DPFSPs), this study took the minimization of the maximum completion time of the workpieces to be processed in all production tasks as the goal, and took the multi-agent Reinforcement Learning (RL) method as the main frame of the solution model, then, combining with the NASH equilibrium theory and the RL method, it proposed a NASH Q-Learning algorithm for Distributed Flow-shop Scheduling Problem (DFSP) based on Mean Field (MF). In the RL part, this study designed a two-layer online learning mode in which the sample collection and the training improvement proceed alternately, the outer layer collects samples, when the collected samples meet the requirement of batch size, it enters to the inner layer loop, which uses the Q-learning model-free batch processing mode to proceed and adopts neural network to approximate the value function to adapt to large-scale problems. By comparing the Average Relative Percentage Deviation (ARPD) index of the benchmark test questions, the calculation results of the proposed algorithm outperformed other similar algorithms, which proved the feasibility and efficiency of the proposed algorithm.

参考文献:

正在载入数据...

版权所有©河南财经政法大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心