Centralized Policy Learning for Consensus Control of Connected and Automated Vehicles

Sidra Ghayour Bhatti; Qadeer Ahmed; Pie Yu Chang; Nur Uddin Javed

Centralized Policy Learning for Consensus Control of Connected and Automated Vehicles

Sidra Ghayour Bhatti, Qadeer Ahmed, Pie Yu Chang, Nur Uddin Javed

Published: 25 Feb 2025, Last Modified: 25 Feb 2025MARW at AAAI 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Connected and automated vehicles, Reinforcement learning, Consensus control, deep deterministic policy gradient (DDPG)

TL;DR: This work proposes a DDPG-based consensus control framework for connected and automated vehicles (CAVs) using centralized training and execution to address non-stationarity and computational challenges in real-time multiagent coordination.

Abstract: Connected and automated vehicles (CAVs) play a key role in the intelligent transportation system of the near future. They offer a promising solution for different challenges, including increased highway accidents, high energy consumption, and growing traffic congestion. The advancements in control theory and reinforcement learning (RL) have given rise to consensus control techniques for effective coordination of multiple CAVs. Multiagent RL (MARL) algorithms are widely used in literature for the consensus problem of CAVs under different driving conditions; however, they encounter several issues, including non-stationarity and computational complexity, that hinder their applicability for real-time applications. To resolve these issues, an approach similar to centralized training and centralized execution (CTCE) utilizing single-agent deep deterministic policy gradient (DDPG) is proposed for consensus control of multiple CAVs following a leader-follower pattern. The central agent is used to generate control policies for all CAVs, mitigating the non-stationarity issues while ensuring consensus. The computational complexity is reduced by using the shared critic network for all CAVs, which helps in efficient and coordinated policy optimization. Reward shaping for the consensus problem is performed using the combination of continuous and discrete reward while ensuring collision avoidance among the CAVs. The effectiveness of the proposed DDPG-based consensus control is demonstrated by simulating various traffic scenarios, including staright line path following and merging, where the effective consensus of multiple CAVs is observed. The proposed approach offers a scalable and practical solution for coordinated control of modern autonomous vehicles.

Submission Number: 7

Loading