Loan Default Prediction using Machine Learning: A Review on the techniques

Machine Learning Model Credit Scoring Data Pre-processing Business driven analysis

Authors

Vol. 8 No. 2 (2024)
Original Research
January 11, 2026

Downloads

This paper aims to discuss the demand of machine learning models in the banking industry and using it to predict the loan default. The prediction is not merely for the purpose of credit scoring; it can also widen its access into capital reserve, risk management, loss forecasting, and marketing campaign. At the same time, machine learning techniques offer benefits in managing big data to develop a more sophisticated scoring model as compared to the traditional econometrics’ approaches. The performance of the model also relies on data pre-processing steps and business/credit consideration put into during its development process, and it must be governed by a set of regulated frameworks. Missing value imputation, outlier treatment, and variable transformation are some common practices in preparing the dataset; combination of treatments with business/credit consideration and variable predictive power assessment are rarely seen and worth further exploration. Logistic Regression has been a popular choice in credit scoring given its ability to generate probability as well as its high interpretability to end-users. Supervised learning like Tree-based models, Support Vector Machine, Artificial Neural Networks are gaining its popularity in recent decade, and the ensemble methods should not be neglected as well. The model’s end to end lifecycle should not be driven merely by achieving high accuracy but also other aspects like justifiable, transparency, and ethical standards.