优先价值网络的多智能体协同强化学习算法

苗国英, 孙英博, 王慧琴

控制工程 ›› 2025, Vol. 32 ›› Issue (4) : 691-698.

控制工程 ›› 2025, Vol. 32 ›› Issue (4) : 691-698.

优先价值网络的多智能体协同强化学习算法

作者信息 +

Multi-agent Cooperative Reinforcement Learning Algorithm Based on Prioritized Value Network

Author information +
文章历史 +

摘要

为了提高多智能体系统的智能决策能力,针对多智能体强化学习的经验回放存在的弊端,以及智能体决策强调动作值而忽略状态值等问题,提出一种基于优先价值网络的多智能体强化学习算法。首先,该算法引入优先经验回放机制,根据重要性权重进行经验复用,解决通过随机采样进行经验复用存在的问题;其次,该算法在智能体的值网络中引入价值优势网络形式,对比状态值与动作优势的信息,使智能体更快地学习到优势动作。多个协同场景的实验结果表明,该算法能够提升多智能体系统的学习与合作质量,使智能体更快、更好地做出决策,完成给定任务。

Abstract

In order to improve the intelligent decision-making ability of the multi-agent system, a multi-agent reinforcement learning algorithm based on prioritized value network is proposed, the disadvantages of experience replay of multi-agent reinforcement learning and the problems of emphasizing action value and ignoring state value in agent decision-making are solved. Firstly, the algorithm introduces a preferential experience replay mechanism to reuse experience according to importance weights, which solves the problem of experience reuse through random sampling. Secondly, the value advantage network is introduced into the value network of the agent to compare the information of state value and action advantage, which makes the agent learn the dominant action fast. The experimental results of multiple collaborative scenarios show that the algorithm can improve the learning and cooperation quality of the multi-agent system, so that the agent can make decisions faster and better, and complete the given task.

关键词

多智能体 / 强化学习 / 优先经验回放 / 价值优势网络 / 状态值

Key words

Multi-agent / reinforcement learning / preferential experience replay / value advantage network / value of state

引用本文

导出引用
苗国英, 孙英博, 王慧琴. 优先价值网络的多智能体协同强化学习算法[J]. 控制工程, 2025, 32(4): 691-698
MIAO Guoying, SUN Yingbo, WANG Huiqin. Multi-agent Cooperative Reinforcement Learning Algorithm Based on Prioritized Value Network[J]. Control Engineering of China, 2025, 32(4): 691-698

18

Accesses

0

Citation

Detail

段落导航
相关文章

/