›› 2018, Vol. 30 ›› Issue (12): 122-130.

• 市场营销 • 上一篇    下一篇

基于错分代价的用户换手机的分类器阈值和预期风险研究

王超发, 孙静春   

  1. 西安交通大学管理学院, 西安 710049
  • 收稿日期:2016-09-22 出版日期:2018-12-28 发布日期:2018-12-21
  • 通讯作者: 王超发(通讯作者),西安交通大学管理学院博士研究生
  • 作者简介:孙静春,西安交通大学管理学院教授,博士生导师,博士。
  • 基金资助:

    国家自然科学基金面上项目(71372164)。

Research of Classifier Threshold and Expected Risk of Mobile Phone Replacement Based on the Misclassification Cost

Wang Chaofa, Sun Jingchun   

  1. School of Management, Xi'an Jiaotong University, Xi'an 710049
  • Received:2016-09-22 Online:2018-12-28 Published:2018-12-21

摘要:

传统的分类算法将正确预测和错误预测平等看待,忽略了人的主观因素,不能很好地对错误率进行控制。本研究基于某移动通讯公司西安分公司的用户消费数据,用引入错分代价后的Logistic模型研究了预测用户换手机的阈值及预期风险,研究发现:引入错分代价后的Logistic模型具有较好的分类效果;不同的错分代价对应不同的最优阈值,但预测准确率基本一致;用传统的阈值0.5进行分类不但降低了预测准确率还增加了预期风险;随着正负类别间的分类代价差异越大,分类器预测所面临的预期风险会上升;最优分类器的取值、最优阈值和预期风险三者之间具有动态平衡和相互制约关系。因而,该结论不但为数据挖掘人员提供多维度的分析框架,而且也为制造商和销售商提供决策参考。

关键词: 错分代价, 算法, 手机用户, 阈值

Abstract:

The traditional classification algorithm treats the correct prediction and the error prediction equally, ignores the subjective factors and can't control the error rate well. Based on the users' consumption data selected from Xi'an branch of a mobile communication company, this paper studies the threshold and expected risk of forecasting mobile phone replacement by using Logistic model with misclassification cost. We find that:the Logistic model with misclassification cost has a good classification effect; different misclassification costs correspond to different optimal thresholds, but the prediction accuracy is basically the same; classification with a traditional threshold of 0.5 not only reduces the accuracy of the forecast but also increases the expected risk; the greater the difference in classification costs between positive and negative categories, the higher the expected risk for the classifier to predict; there is a dynamic equilibrium and mutual restraint between the optimal classifier's value, the optimal threshold and the expected risk. Thus, these results not only provide a multi-dimensional analysis framework for data mining researchers, but also provide a decision-making reference for manufacturers and vendors.

Key words: misclassification cost, algorithms, mobile users, threshold