Reinforcement Learning Algorithm on Traffic Light Control

Reinforcement Learning Genetic Algorithm Traffic light control Q-Learning

Authors

  • Ooi Wei Hong Asia Pacific University of Technology & Innovation (APU) Technology Park Malaysia 57000 Kuala Lumpur, Malaysia https://orcid.org/0009-0004-9608-5930
  • Lee Boon Jie Asia Pacific University of Technology & Innovation (APU) Technology Park Malaysia 57000 Kuala Lumpur, Malaysia
  • Lee Qi Wen Asia Pacific University of Technology & Innovation (APU) Technology Park Malaysia 57000 Kuala Lumpur, Malaysia
  • Lee Jun Wei Asia Pacific University of Technology & Innovation (APU) Technology Park Malaysia 57000 Kuala Lumpur, Malaysia
  • Dr.Adeline Sneha J
    adeline.john@apu.edu.my
    Asia Pacific University of Technology & Innovation (APU) Technology Park Malaysia 57000 Kuala Lumpur, Malaysia
  • Dr.Kamalanathan Shanmugam Asia Pacific University of Technology & Innovation (APU) Technology Park Malaysia 57000 Kuala Lumpur, Malaysia
  • Juhairi Aris Muhamad Shuhili Asia Pacific University of Technology & Innovation (APU) Technology Park Malaysia 57000 Kuala Lumpur, Malaysia
Vol. 8 No. 2 (2024)
Original Research
January 12, 2026

Downloads

he efficiency and efficacy of genetic algorithms (GA) and reinforcement learning (RL) algorithms in traffic signal control are examined in this study. The study looks into the benefits and drawbacks of RL and GA in the setting of bad traffic circumstances that increase fuel consumption and trip time. While GA investigates a larger solution area for optimizing traffic light control schemes, RL demonstrates flexibility through self-learning in diverse contexts. The work summaries the body of research on reinforcement learning's optimization for traffic signal management and shows how it may cut down on average travel time, delays, and stops. The presentation includes a thorough comparison of several approaches, such as particle swarm optimization and fuzzy logic. In order to overcome obstacles and enhance performance overall, hybridization with Deep Deterministic Policy Gradient and adjustments to GA are suggested as future paths and improvements. The hardware and software specifications, as well as the process used to apply RL to traffic light management, are described in the materials and techniques section. We address policy gradient algorithms, exploration/exploitation tactics, and the parameters of the algorithm. The effects of changing settings on average wait times and collisions are shown by the results. According to the study's findings, RL greatly enhances traffic flow and cuts typical wait times from 300 to 20 seconds, especially when parameters are optimized. Future studies can investigate how to combine RL with other algorithms for the best possible traffic management in practical situations.