Keywords: multi-agent reinforcement learning, autonomous transportation, networked communication, deep learning, mean-field games
TL;DR: To address the MARL scalability problem when applied to very large populations in autonomous transport, we use networked Mean-Field Games, introducing deep learning and empirical mean-field estimation to help apply them to more realistic scenarios.
Abstract: Multi-agent reinforcement learning (MARL) algorithms typically struggle to scale to very large populations of agents such as are relevant for autonomous transportation systems. The Mean-Field Game (MFG) framework can be used to address this issue by approximating the solutions of games involving such large populations. Recent algorithms allow decentralised agents, possibly connected via a communication network, to learn equilibria in MFGs from a non-episodic run of the empirical system. While this is more reflective of real-world transportation scenarios than classical MFG works, these recent approaches are only given for tabular settings. This computationally limits the size of the feasible state/action spaces, and also means that the algorithms cannot generalise beyond policies depending only on the agent’s local state to so-called ‘population-dependent’ policies. We address these limitations for real-world applicability by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the mean field in the observation for players’ policies, it is unrealistic to assume decentralised agents have access to this global information in real-time transportation settings: we therefore also provide new algorithms allowing agents to locally estimate the global empirical distribution, and to improve this estimate via inter-agent communication. We show theoretically that exchanging policy information helps networked agents outperform both independent and even centralised agents in function-approximation settings. Our experiments demonstrate this happening empirically, by an even greater margin than in tabular settings, and show that the communication network allows decentralised agents to estimate the mean field for population-dependent policies.
Submission Number: 11
Loading