Abstract
Recently, Big Data has become an increasingly important source to support traditional credit scoring. Personal credit evaluation based on machine learning approaches focuses on the application data of clients in open banking and new banking platforms with challenges about Big Data quality and model risk. This paper represents a PySpark code for computationally efficient use of statistical learning and machine learning algorithms for the application scenario of personal credit evaluation with a performance comparison of models including logistic regression, decision tree, random forest, neural network, and support vector machine. The findings of this study reveal that the logistic regression methodology represents a more reasonable coefficient of determination and a lower false negative rate than other models. Additionally, it is computationally less expensive and more comprehensible. Finally, the paper highlights the steps, perils, and benefits of using Big Data and machine learning algorithms in credit scoring.
Original language | English |
---|---|
Title of host publication | Statistical Modeling and Simulation for Experimental Design and Machine Learning Applications |
Subtitle of host publication | Selected Contributions from SimStat 2019 and Invited Papers |
Editors | Jürgen Pilz, Viatcheslav B. Melas, Arne Bathke |
Place of Publication | Gewerbestrasse, Switzerland |
Publisher | Springer, Cham |
Chapter | 14 |
Pages | 245-265 |
Number of pages | 21 |
ISBN (Electronic) | 978-3-031-40055-1 |
ISBN (Print) | 978-3-031-40054-4, 978-3-031-40057-5 |
DOIs | |
Publication status | Published - 19 Oct 2023 |
Event | 10th International Workshop on Simulation and Statistics - Faculty of Natural Sciences Hellbrunner Strasse 34 5020, Salzburg, Austria Duration: 2 Sept 2023 → 6 Sept 2023 https://datascience.plus.ac.at/SimStatSalzburg2019/ |
Publication series
Name | Contributions to Statistics |
---|---|
Publisher | Springer Cham |
ISSN (Print) | 1431-1968 |
Conference
Conference | 10th International Workshop on Simulation and Statistics |
---|---|
Abbreviated title | SimStat 2019 |
Country/Territory | Austria |
City | Salzburg |
Period | 2/09/23 → 6/09/23 |
Internet address |
Keywords
- Credit score
- Big Data
- Machine learning
- Risk Management
- Finance
Fingerprint
Dive into the research topics of 'Big Data for Credit Risk Analysis: Efficient Machine Learning Models Using PySpark'. Together they form a unique fingerprint.Prizes
-
First Prize for the presentation of "A non-parametric-based computationally efficient approach for credit scoring using non-traditional data", at the 8th International Conference on Risk Analysis and Design of Experiments
Ashofteh, Afshin (Recipient), 26 Apr 2019
Prize: Prize (including medals and awards)