登录    注册    忘记密码

详细信息

Solving flow-shop scheduling problem with a reinforcement learning algorithm that generalizes the value function with neural network  ( SCI-EXPANDED收录 EI收录)  

文献类型:期刊文献

英文题名:Solving flow-shop scheduling problem with a reinforcement learning algorithm that generalizes the value function with neural network

作者:Ren, Jianfeng[1,2];Ye, Chunming[1];Yang, Feng[3]

第一作者:任剑锋;Ren, Jianfeng

通讯作者:Ye, CM[1]

机构:[1]Univ Shanghai Sci & Technol, Sch Business, Shanghai 200093, Peoples R China;[2]Henan Univ Econ & Law, Sch Comp & Informat Engn, Zhengzhou 450018, Peoples R China;[3]Henan Univ Chinese Med, Sch Management, Zhengzhou 450018, Peoples R China

第一机构:Univ Shanghai Sci & Technol, Sch Business, Shanghai 200093, Peoples R China

通讯机构:[1]corresponding author), Univ Shanghai Sci & Technol, Sch Business, Shanghai 200093, Peoples R China.

年份:2021

卷号:60

期号:3

起止页码:2787-2800

外文期刊名:ALEXANDRIA ENGINEERING JOURNAL

收录:;EI(收录号:20210609891896);Scopus(收录号:2-s2.0-85100427336);WOS:【SCI-EXPANDED(收录号:WOS:000634505700006)】;

基金:The study was supported by "Key Soft Science Project of "Science and Technology Innovation Action Plan of Shanghai Science and Technology Commission, China (No.20692104300); National Natural Science Foundation, China (No.71840003); Technology Development Project of University of Shanghai for Science and Technology, China (No.2018KJFZ043).".

语种:英文

外文关键词:Flow-shop scheduling problem (FSP); Reinforcement learning (RL); Generalized value function; Neural network (NN)

摘要:This paper solves the flow-shop scheduling problem (FSP) through the reinforcement learning (RL), which approximates the value function with neural network (NN). Under the RL framework, the state, strategy, action, reward signal, and value function of FSP were described in details. Considering the intrinsic features of FSP, various information of FSP was mapped into RL states, including the maximum, minimum, and mean of makespan, the maximum, minimum, and mean of remaining operations, as well as the load of machines. Besides, the optimal scheduling rules corresponding to specific states were mapped into the actions of RL. On this basis, the NN was trained to establish the mapping between states and actions, and select the action of the highest probability under a specific state. In addition, a reward function was constructed based on the idle time (IT) of machines, and the value function was generalized by the NN. Finally, our algorithm was tested on 23 benchmark examples and more than 7 sets of example machines. Small relative errors were achieved on 20 of the 23 benchmark examples and satisfactory results were realized on all 7 machine sets. The results confirm the superiority and universality of our algorithm, and indicate that FSP can be solved effectively by completely mapping it into our RL framework. The research results provide a reference for solving similar problems with RL algorithm based on value function approximation. (C) 2021 THE AUTHORS. Published by Elsevier BV on behalf of Faculty of Engineering, Alexandria University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

参考文献:

正在载入数据...

版权所有©河南财经政法大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心