Customer churn prediction in telecommunication industry using machine learning models

Customer churn big data random forest support vector machine gradient boosting-nearest neighbor

Authors

  • Laiba Nadeem School of Computing Asia Pacific University of Technology & Innovation (APU) Kuala Lumpur, Malaysia
  • Chandra Reka Ramachandran
    Chandra.reka@staffmail.apu.edu.my
    School of Computing Asia Pacific University of Technology & Innovation (APU) Kuala Lumpur, Malaysia
  • Mandava Rajeswari School of Computing Asia Pacific University of Technology & Innovation (APU) Kuala Lumpur, Malaysia
Vol. 5 No. 4 (2021)
Original Research
January 27, 2026

Downloads

In recent times the mobile computing is becoming one of the most apparent means of communication. The increase in the telecommunication service providers has resulted in the fierce competition making the market saturated to an extent where companies are struggling to retain their customers. This in turn has shifted the focus of the companies from building a large customer base into retaining customers. Customer churn refers to the tendency of customers to cancel their service or subscription and switch to competitors. Since the cost of attaining a new customer is greater than retaining existing customers companies need to adopt strategies to effectively predict customer churn. The companies are storing a vast amount of resources related to customers (big data) but fail to realize their potential in solving business problems. To this date, few companies have adopted machine learning techniques to accurately predict the customer churn in the telecommunication sector thereby more work is required to fill the gap. This research aims to analyze the application of machine learning in the Telecommunication Industry, application of machine learning models for customer churn prediction in the telecommunication sector and churn prediction challenges. Based on the findings the most common machine learning models adopted in the telecommunication sector include Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and K-nearest neighbor (KNN). The researchers mostly overlook the class imbalance problem. The issue of high dimensionality data can be solved by incorporating Principal Component Analysis (PCA) in machine learning models and the adoption of data transformation techniques improves the overall performance of the model. The hybrid models and models incorporated with genetic algorithms provide better performance compared to generic machine learning models.