Description

This capsule is related to a novel method of machine learning for Big Data, which is discussed in a manuscript in Expert Systems with Applications. It is appropriate for the default prediction of high-risk branches or customers and online banking. This study uses the Kruskal-Wallis non-parametric statistic to form a conservative credit-scoring model and to study the impact of modeling performance on the benefit of the credit provider. This is the first study that develops an online non-parametric credit scoring system, which is able to reselect effective features automatically for continued credit evaluation and weigh them out by their level of contribution with a good diagnostic ability. We have implemented this new methodology on Ridge, Lasso, Elastic-net Regressions, Random forest classifier, and Linear support vector machine.
Data source
loan.csv - data is from Lending Club includes all funded loans from 2012 to 2017. Each loan includes applicant information provided by the applicant as well as current loan status (Current, Late, Fully Paid, etc.) and latest payment information.

Technologies
Project is created with:

Python version: 3.8.1
PySpark version: 3.0.2
Date made available28 Feb 2021
PublisherCode Ocean
Date of data production2012 - 2017

Cite this