A Machine Learning Approach for Face Mask Detection System with AdamW Optimizer

Convolutional neural network (CNN) deep learning face mask detection model AdamW weight decay normalized weight decay

Authors

  • Leong Kah Meng School of computing Asia Pacific Univeristy of Technology and Innovation Kuala Lumpur, Malaysia
  • Ho Hooi Yi School of computing Asia Pacific Univeristy of Technology and Innovation Kuala Lumpur, Malaysia
  • Ng Bo Wei School of computing Asia Pacific Univeristy of Technology and Innovation Kuala Lumpur, Malaysia
  • Lim Jia Xin School of computing Asia Pacific Univeristy of Technology and Innovation Kuala Lumpur, Malaysia
  • Zailan Arabee Abdul Salam
    zailan@mail.apu.edu.my
    School of computing Asia Pacific Univeristy of Technology and Innovation Kuala Lumpur, Malaysia
Vol. 7 No. 1 (2023)
Original Research
January 15, 2026

Downloads

As Adam optimizer’s learning rate decay hyperparameter has recently been deprecated, this journal article focuses not only on providing an alternate optimizer but also on comparing the performance of the said optimizer, AdamW, with the Adam optimizer using a face mask detection model. This study experiments with different weight decay values and finds that a weight decay of 0.00009 with the AdamW optimizer consistently achieves a 98% accuracy rate. Aside from that, this study also discusses the differences between Adam with L2-regularization and AdamW on how the weight decay is decoupled from the Adam optimizer’s gradient-based update that impacts the performance of AdamW. Overall, the study provides insights to those new to AdamW and looking for a starting point in optimizing deep learning models.