Loan Repayment Default Prediction Using Supervised Machine Learning Techniques on Financial Data
DOI:
https://doi.org/10.51173/ijds.v2i2.22Keywords:
Machine Learning, Loan Prediction, Weka, Financial Risk Assessment, Supervised LearningAbstract
With the enhancement of technology facilitating the expansion of businesses and thoughts, more and more people are applying for loans for personal or business use. However, banks have limited assets, which limit the amount of loans that can be granted. Identifying the right persons to grant loans to can be a time-consuming process. Banks seek to grant loans to individuals who can repay the loan on time, enabling the bank to obtain maximum profits. This work aims to solve the loan default problem with minimum costs to banks. This work consists of five main stages: pre-processing, feature extraction, machine learning techniques, evaluation models, and performance analysis to select the best machine learning models. Then, two datasets with different features are used. The first dataset has five features, and the second contains eighteen features. We are splitting the datasets into various training percentages (40%, 50%, 60% and 70%). The rest of the dataset is used for testing using only the Weka application. KNN is applied with different cross-validations, such as 15, 10, and 5, and different numbers of nearest neighbours (1, 5, 10, and 15). For the first dataset, the highest accuracy is 97.47% with two cross-validation values, 15 and 10, in the 10 nearest neighbours. The KNN was also implemented on the second dataset to compute the highest accuracy, 88.21% in three cross-validation values (15, 10, and 5) with the 15 nearest neighbours. Then, logistic regression is applied to compare the results of the correct classification value computed at the highest value of 96.93% with the (70% training set for the first dataset. The highest accuracy was obtained at 88.32% after splitting the second dataset (40%) for training and the rest for testing.
Downloads
References
V. Chang, S. Sivakulasingam, H. Wang, S. T. Wong, M. A. Ganatra, and J. Luo, "Credit risk prediction using machine learning and deep learning: A study on credit card customers," Risks, vol. 12, no. 11, p. 174, 2024, doi: 10.3390/risks12110174.
A. Akinjole, O. Shobayo, J. Popoola, O. Okoyeigbo, and B. Ogunleye, "Ensemble-based machine learning algorithm for loan default risk prediction," Mathematics, vol. 12, no. 21, Nov. 2024, doi: 10.3390/math12213423.
S. Ullah, H. Higgins, B. Braem, B. Latre, C. Blondia, I. Moerman, S. Saleem, Z. Rahman, and K. S. Kwak, "A comprehensive survey of wireless body area networks: On PHY, MAC, and network layers solutions," J. Med. Syst., https://doi.org/10.1007/s10916-010-9571-3.
A. Alagic, N. Zivic, E. Kadusic, D. Hamzic, N. Hadzajlic, M. Dizdarevic, and E. Selmanovic, "Machine learning for an enhanced credit risk analysis: A comparative study of loan approval prediction models integrating mental health data," Mach. Learn. Knowl. Extr., vol. 6, no. 1, pp. 53–77, 2024. https://doi.org/10.3390/make6010004.
A. Alagic, N. Zivic, E. Kadusic, D. Hamzic, N. Hadzajlic, M. Dizdarevic, and E. Selmanovic, "Machine learning for an enhanced credit risk analysis: A comparative study of loan approval prediction models integrating mental health data," Mach. Learn. Knowl. Extr., vol. 6, no. 1, pp. 53–77, 2024. https://doi.org/10.3390/make6010004.
D. Krasovytskyi and A. Stavytskyy, "Predicting mortgage loan defaults using machine learning techniques," Ekonomika, vol. 103, no. 2, pp. 140–160, 2024. https://doi.org/10.15388/Ekon.2024.103.2.8.
R. K. Amin, Indwiarti and Y. Sibaroni, "Implementation of decision tree using C4.5 algorithm in decision making of loan application by debtor (Case study: Bank pasar of Yogyakarta Special Region)," 2015 3rd International Conference on Information and Communication Technology (ICoICT), Nusa Dua, Bali, Indonesia, 2015, pp. 75-80, doi: 10.1109/ICoICT.2015.7231400.
A. Uzair, T. Aziz, H. Ilyas, S. Asim, and B. N. Kadhar, "An empirical study on loan default prediction models," J. Comput. Theor. Nanosci., vol. 16, no. 8, pp. 3483–3488, 2019. https://doi.org/10.1166/jctn.2019.8312.
F. M. Assef and M. T. A. Steiner, "Machine learning techniques in bank credit analysis of companies: A case study of a Brazilian bank," Proceedings of International Conference on Computers and Industrial Engineering, CIE, vol. 2019-October, 2019, Accessed: Jul. 20, 2025. [Online]. Available: https://vbn.aau.dk/en/publications/machine-learning-techniques-in-bank-credit-analysis-of-companies-.
M. A. Sheikh, A. K. Goel and T. Kumar, "An Approach for Prediction of Loan Approval using Machine Learning Algorithm," 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2020, pp. 490-494, doi: 10.1109/ICESC48915.2020.9155614.
P. S. Saini, A. Bhatnagar and L. Rani, "Loan Approval Prediction using Machine Learning: A Comparative Analysis of Classification Algorithms," 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 2023, pp. 1821-1826, doi: 10.1109/ICACITE57410.2023.10182799.
U. E. Orji, C. H. Ugwuishiwu, J. C. N. Nguemaleu, and P. N. Ugwuanyi, "Machine learning models for predicting bank loan eligibility," 2022 IEEE Nigeria 4th International Conference on Disruptive Technologies for Sustainable Development (NIGERCON), Lagos, Nigeria, 2022, pp. 1-5, doi: 10.1109/NIGERCON54645.2022.9803172.
C. Prasanth, R. P. Kumar, A. Rangesh, N. Sasmitha and D. B, "Intelligent Loan Eligibility and Approval System based on Random Forest Algorithm using Machine Learning," 2023 ,International Conference on Innovative Data Communication Technologies and Application (ICIDCA), Uttarakhand, India, 2023, pp. 84-88, doi: 10.1109/ICIDCA56705.2023.10100225.
S. Dosalwar, K. Kinkar, R. Sannat, and N. Pise, "Analysis of loan availability using machine learning techniques," Int. J. Adv. Res. Sci. Commun. Technol. (IJARSCT), vol. 9, no. 1, pp. 15, Sep. 2021. https://doi.org/10.48175/IJARSCT-189.
N. Robinson and N. Sindhwani, "Loan Default Prediction Using Machine Learning," 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions), ICRITO 2024, 2024, doi: 10.1109/ICRITO61523.2024.10522232..
J. A. Gómez, J. Arévalo, R. Paredes, and J. Nin, "End-to-end neural network architecture for fraud scoring in card payments," Pattern Recogn. Lett., vol. 105, pp. 175–181, Apr. 2018. https://doi.org/10.1016/j.patrec.2017.08.024.
X. Li, D. Ergu, D. Zhang, D. Qiu, Y. Cai, and B. Ma, "Prediction of loan default based on multi-model fusion," Procedia Computer Science, vol. 199, pp. 757–764, 2022. https://doi.org/10.1016/j.procs.2022.01.094.
J. Xu, "Factors Influencing Loan Default: An Empirical Analysis Based on Microscopic Evidence," J. Econ. Bus. Manag., vol. 13, no. 1, pp. 1-10, Jan. 2025. doi: 10.18178/joebm.2025.13.1.841
J. C. Cox, D. Kreisman, and S. Dynarski, "Designed to fail: Effects of the default option and information complexity on student loan repayment," J. Public Econ., vol. 192, p. 104298, Dec. 2020, doi: 10.1016/j.jpubeco.2020.104298.
T. M. Alam et al., "An investigation of credit card default prediction in the imbalanced datasets," in IEEE Access, vol. 8, pp. 201173–201198, 2020, doi: 10.1109/ACCESS.2020.3033784.
S. M. Fati, "A loan default prediction model using machine learning and feature engineering," ICIC Express Lett., vol. 18, no. 1, pp. 27–37, 2024. DOI: 10.24507/icicel.18.01.27.
[22] H. Ayari and R. Guetari, "Integrating genetic algorithms and ensemble learning for improved and transparent credit scoring," , Business Information Systems (BIS 2025), K. Węcel, Ed., Lecture Notes in Business Information Processing, vol. 554, Cham: Springer, 2025, https://doi.org/10.1007/978-3-031-94193-1_17.
A. Akinjole, O. Shobayo, J. Popoola, O. Okoyeigbo, and B. Ogunleye, "Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction," Mathematics, vol. 12, no. 21, p. 3423, 2024, doi: 10.3390/math12213423.
D. Xu, S. Yuan, L. Zhang and X. Wu, "FairGAN: Fairness-aware Generative Adversarial Networks," 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 570-575, doi: 10.1109/BigData.2018.8622525.
S. N. Kalid, K. -C. Khor, K. -H. Ng and G. -K. Tong, "Detecting Frauds and Payment Defaults on Credit Card Data Inherited With Imbalanced Class Distribution and Overlapping Class Problems: A Systematic Review," in IEEE Access, vol. 12, pp. 23636-23652, 2024, doi: 10.1109/ACCESS.2024.3362831.
S. Wattanawongwan, C. Mues, R. Okhrati, T. Choudhry, and M. C. So, "Modelling credit card exposure at default using vine copula quantile regression," Eur. J. Oper. Res., vol. 311, no. 1, pp. 387-399, Nov. 2023, doi: 10.1016/j.ejor.2023.05.016.
M. Khodayari Gharanchaei and P. P. Panda, "Comparison of several machine learning methods in credit card default classification," J. Strateg. Int. Stud., vol. 18, no. 1, pp. 24–31, 2024. https://ssrn.com/abstract=4902470.
A. Subasi and S. Cankurt, "Prediction of default payment of credit card clients using Data Mining Techniques," 2019 International Engineering Conference (IEC), Erbil, Iraq, 2019, pp. 115-120, doi: 10.1109/IEC47844.2019.8950597.
A. Verikas, Z. Kalsyte, M. Bacauskiene, and A. Gelzinis, "Hybrid and ensemble-based soft computing techniques in bankruptcy prediction: A survey," Soft Comput, vol. 14, pp. 995–1010, 2010. https://doi.org/10.1007/s00500-009-0490-5.
J. Ifft, R. Kuhns, and K. Patrick, "Can machine learning improve prediction – an application with farm survey data," Int. Food Agribus. Manag. Rev., vol. 21, no. 8, pp. 1083–1098, 2018.https://doi.org/10.22434/IFAMR2017.0098.
S. K. Saeed and H. Hagras, "A fraud-detection fuzzy logic based system for the Sudanese financial sector," SUST J. Eng. Comput. Sci. (JECS), vol. 20, no. 1, pp. 17–xx, 2019.
A. Kumar, S. Sharma, and M. Mahdavi, "Machine Learning (ML) Technologies for Digital Credit Scoring in Rural Finance: A Literature Review," Risks, vol. 9, no. 11, p. 192, 2021, doi: 10.3390/risks9110192.
T. Xu, "Credit risk assessment using a combined approach of supervised and unsupervised learning," Journal of Computational Methods in Engineering Applications, vol. 4, no. 1, pp. 1–12, 2024. DOI: 10.62836/jcmea.v4i1.040105.
Y. Li, "Credit Risk Prediction Based on Machine Learning Methods," 2019 14th International Conference on Computer Science & Education (ICCSE), Toronto, ON, Canada, 2019, pp. 1011-1013, doi: 10.1109/ICCSE.2019.8845444.
M. Madaan, A. Kumar, C. Keshri, R. Jain, and P. Nagrath, "Loan default prediction using decision trees and random forest: A comparative study," IOP Conf. Ser.: Mater. Sci. Eng., vol. 1022, p. 012042, 2021. [Online]. Available: https://doi.org/10.1088/1757-899X/1022/1/012042.
[38] Aslam, U.; Aziz, T.; Ilyas, H.; Sohail, A.; Batcha, N.; Kadhar, N., "An Empirical Study on Loan Default Prediction Models," Journal of Computational and Theoretical Nanoscience, vol. 16, no. 8, pp. 3483-3488, Aug. 2019, doi: 10.1166/jctn.2019.8312.
Y. Song, Y. Wang, X. Ye, R. Zaretzki, and C. Liu, "Loan default prediction using a credit rating-specific and multi-objective ensemble learning scheme," Information Sciences, vol. 629, pp. 599-617, Jun. 2023, doi: 10.1016/j.ins.2023.02.014.
K. Niu, Z. Zhang, Y. Liu, and R. Li, "Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in P2P lending," Information Sciences, vol. 536, pp. 120-134, Oct. 2020, doi: 10.1016/j.ins.2020.05.040.





