Training Flappy Bird with Deep Q Network and SARSA

Deep Q-Network Epsilon-Greedy Exploration Adam Optimizer SGD Optimizer Flappy Bird

Authors

  • Choo Huan Long Asia Pacific University of Technology and Innovation (APU) Kuala Lumpu, Malaysia
  • Liew Wen Heng Asia Pacific University of Technology and Innovation (APU) Kuala Lumpur, Malaysia
  • Wong Ying Cheng Asia Pacific University of Technology and Innovation (APU) Kuala Lumpur, Malaysia
  • Yong Zhen Xing Asia Pacific University of Technology and Innovation (APU) Kuala Lumpur, Malaysia
  • Zailan Arabee Abdul Salam
    zailan@apu.edu.my
    Asia Pacific University of Technology and Innovation (APU) Kuala Lumpur, Malaysia
Vol. 8 No. 1 (2024)
Original Research
January 10, 2026

Downloads

Deep Q-Network (DQN) is implemented in this Flappy Bird Game, and the purpose of the project is to tweak and change the parameters to meet the desired outcome, which is passing pipes if the agent can. DQN are employed into the project to maximize cumulative reward while making decisions in real-time gameplay. In this project, the Flappy Bird game environment is set up along with the state and action spaces, the Q-network, an experience replay buffer, and a training loop. Using ε-greedy policies, the agent learns to make the best decisions by balancing exploration and exploitation. DeepMind’s experience replay was employed to enhance the stability of learning. Adam optimizer and SGD optimizer are monitoring the agent’s train/loss, and the α and ε will be modified to make a comparison between different parameters.