›› 2012, Vol. 24 ›› Issue (2): 140-145.

• 市场营销 • 上一篇    下一篇

随机森林方法及其在客户流失预测中的应用研

应维云   

  1. 上海财经大学信息管理与工程学院,上海 200433
  • 收稿日期:2012-06-19 修回日期:2012-06-19 出版日期:2012-02-25 发布日期:2012-06-20

The Research on Random Forests and the Application in Customer Churn Prediction

  1. Shanghai University of Finance & Economics, Shanghai 200433
  • Received:2012-06-19 Revised:2012-06-19 Online:2012-02-25 Published:2012-06-20

摘要: 在全球化的市场竞争中,企业如何利用现有资源,提高客户满意度,保住现有客户,已成为企业面临的主要问题,客户流失预测越来越受到企业关注。本文针对实际客户流失数据中正负样本数量不平衡而且数据量大的特点,提出一种改进的平衡随机森林算法,并将其应用于某商业银行的客户流失预测。实际数据集测试结果表明,与传统的预测算法比较,这种算法集成了抽样技术和代价敏感学习的优点,适合解决大数据集和不平衡数据,具有更高的精确度。

关键词: 流失预测, 不平衡数据, 随机森林

Abstract: Facing the competition in the global market, enterprises are increasingly keener to explore how to hold the existing customers and improve their satisfaction by making use of the existing resources. The customer churn prediction has aroused more and more attention from enterprises. Given the unbalance and size of actual customer churn data, the paper puts forward an improved balanced-random forest algorithm and applies it to predict the customer churn of a commercial bank. The actual data set test result shows that the algorithm, on the strength of both sampling technique and cost-sensitive learning, has a higher accuracy in solving a large data set and unbalance data than the traditional prediction algorithms.

Key words: churn prediction, imbalanced data, random forests