›› 2017, Vol. 29 ›› Issue (9): 59-71.

• 经济与金融管理 • 上一篇    下一篇

基于大样本数据模型的汽车贷款违约预测研究

舒扬, 杨秋怡   

  1. 华中科技大学经济学院, 武汉 430074
  • 收稿日期:2015-09-09 出版日期:2017-09-28 发布日期:2017-10-09
  • 通讯作者: 杨秋怡(通讯作者),华中科技大学经济学院博士研究生。
  • 作者简介:舒扬,华中科技大学经济学院讲师,硕士生导师,博士
  • 基金资助:

    中央高校基本科研业务费(2015AC007);华中科技大学研究生创新训练项目(2015650011)。

Research on Auto Loan Default Prediction Based on Large Sample Data Model

Shu Yang, Yang Qiuyi   

  1. School of Economics, Huazhong University of Science and Technology, Wuhan 430074
  • Received:2015-09-09 Online:2017-09-28 Published:2017-10-09

摘要:

本文运用国内某知名汽车金融公司2014年12月的47138条客户数据,首先运用ROC曲线检验逐步回归功效,再分别建立二值选择模型和计数模型对贷款客户违约状况进行预测,并运用遗传算法对不平衡样本进行一对一匹配,最终得到预测结果。结果表明现存违约评估体系不够有效,客户基本信息、区位、贷款信息、车型、信用状况、房产、贷款期间冲击事件等均会对违约状况产生相应影响。另外,我们得出匹配后的平衡样本预测准确率仍然很高,Logistic模型最适用于客户是否违约的预测,而负二项模型在违约时长的预测中效果更佳的结论。

关键词: 汽车贷款, 违约预测, 逐步回归, ROC曲线, 二值选择模型, 计数模型, 遗传算法匹配

Abstract:

Using the data containing 47,138 customers in December 2014 from a well-known auto finance company in China, this paper first uses ROC curves to test the efficiency of Stepwise Regression, then respectively applies Binary Choice Model and Count Model to predict the default status of loan customers. Afterwards, we apply Genetic Algorithm to do one-to-one matching on unbalanced sample and finally obtain the predicted results. Based on the above analysis, we argue that the current default evaluation system is ineffective, and variables including customers' basic information, geographical zone, loan messages, car type, credit status, estate, impact events during loan period all have corresponding impacts to customers' default status. Furthermore, the paper finally concludes that balanced sample after matching still possesses superior prediction accuracy rate, that Logistic Model is the most suitable when companies intend to predict whether a customer will default, and that Negative Binomial Model has better efficiency if companies need to know the time length of a customer not paying back.

Key words: auto loan, default prediction, Stepwise Regression, ROC curves, Binary Choice Model, Count Model, Genetic Algorithm Matching