管理评论 ›› 2022, Vol. 34 ›› Issue (11): 261-271.

• 会计与财务管理 • 上一篇    下一篇

基于借款描述的违约判别研究

迟国泰, 董冰洁   

  1. 大连理工大学经济管理学院, 大连 116024
  • 收稿日期:2020-07-16 出版日期:2022-11-28 发布日期:2022-12-30
  • 作者简介:迟国泰,大连理工大学经济管理学院教授,博士生导师,博士;董冰洁,大连理工大学经济管理学院博士研究生。
  • 基金资助:
    国家自然科学基金重点项目(71731003);国家自然科学基金面上项目(72071026;72173096;71971051;71971034;71873103);国家自然科学基金青年项目(71901055;71903019);国家自然科学基金地区项目(72161033);国家社会科学基金重大项目(18ZDA095)。

Research on Default Prediction Based on Loan Description

Chi Guotai, Dong Bingjie   

  1. School of Economics and Management, Dalian University of Technology, Dalian 116024
  • Received:2020-07-16 Online:2022-11-28 Published:2022-12-30

摘要: 违约判别对金融机构贷款和商业信用决策具有重要意义。本文研究的问题是如何使用非结构化的借款描述数据构建违约判别模型,提高金融机构识别违约客户的能力。本文的创新与特色:一是使用pca-foword方法提取借款描述中信息,不仅能避免使用统计特定字符频数的方法提取借款描述信息不充分的弊端,而且避免了从借款描述中提取的信息与鉴别客户违约状态无关的弊端。二是同时使用借款描述数据和数字数据两类数据建立违约判别模型,避免了使用单类数据构建违约判别模型准确性不足的弊端。研究表明:在对比分析中,使用两类数据(借款描述数据和数字数据)建立的最优临界点逻辑回归判别模型的准确性最高。不同借贷公司识别借款人违约的能力存在差异;基本情况说明的描述与违约呈现正相关关系且显著;以生产经营为目的的描述与违约呈现正相关关系且显著;承诺还钱的描述与违约呈现负相关关系且显著;与借款人所供职的公司或所经营公司相关的描述与违约呈现负相关关系且显著。上述关系在控制了借款人经济特征后依然成立。

关键词: 借款描述, 文本分析, 临界点, 违约判别

Abstract: Prediction of default is of great significance to financial institutions’ loan and business credit decisions. This paper aims to explore how to use the unstructured loan description to build a default prediction model thtt can help financial institutions identify default customers more efficiently. The innovative and unique points of this study are reflected in two aspects. First, the use of the pca-foword method to extract the information in the loan description can not only avoid the shortcoming of specific character frequency method through which the loan description information extracted is insufficient, but also preclude the possibility that the information extracted from the loan description is irrelevant for the identification of customer default status; second, the use of both loan description data and digital data to establish a default prediction model avoids the shortcoming of single-data-based model that is not highly accurate in default prediction. The research shows that in the comparative analysis, the default prediction model established by using the optimal critical point logistic regression model and two types of data (loan description data and numerical data) has the highest accuracy of default prediction; Regression analysis shows that there are differences in the ability of different borrowing companies to identify the borrower’s default; description of the basic situation shows a significant positive correlation with the default; description in relation to production and operation shows a significant positive correlation with the default; description of repayment commitment has a significant negative correlation with the default; description in relation to the company that the borrower works in or operates has a significant negative correlation with the default. The above relationship remains so after the borrower’s economic characteristics are controlled.

Key words: loan description, text analysis, cut-off point, default prediction