管理评论 ›› 2020, Vol. 32 ›› Issue (7): 236-245.

• 中国系统管理学专辑 • 上一篇    下一篇

基于综合集成研讨厅的半监督客户关键特征选择模型研究

谢玲1, 陈文婷2, 曹瀚文3, 肖进4   

  1. 1. 遵义医科大学医学信息工程学院, 遵义 563006;
    2. 西南财经大学科研处, 成都 611130;
    3. 华为技术有限公司集团财经, 深圳 518000;
    4. 四川大学商学院, 成都 610064
  • 收稿日期:2019-08-26 出版日期:2020-07-28 发布日期:2020-08-08
  • 通讯作者: 肖进(通讯作者),四川大学商学院教授,博士生导师,博士
  • 作者简介:谢玲,遵义医科大学医学信息工程学院副教授,博士;陈文婷,西南财经大学科研处科员,硕士;曹瀚文,华为技术有限公司集团财经工程师,硕士。
  • 基金资助:
    国家社会科学基金重大项目(18VZL006);四川省杰出青年基金项目(2020JDJQ0021);四川省天府万人计划;四川大学杰出青年基金(sksyl201709);四川大学科技领军人才培育项目;北京市财政课题"新经济支撑北京高质量发展研究"(PXM2020-178216-000001)。

Semi-supervised Key Feature Selection of Customers Based on Hall for Workshop of Meta-synthetic Engineering

Xie Ling1, Chen Wenting2, Cao Hanwen3, Xaio Jin4   

  1. 1. School of Medical Information Engineering, Zunyi Medical University, Zunyi 563006;
    2. Office of Academic Research, Southwestern University of Finance and Economics, Chengdu 611130;
    3. Huawei Technologies Co., Ltd., Shenzhen 518000;
    4. Business School, Sichuan University, Chengdu 610064
  • Received:2019-08-26 Online:2020-07-28 Published:2020-08-08

摘要: 客户分类一直是企业客户关系管理(CRM)中最重要的问题之一,而选择出客户的关键特征更是其中的重中之重。在大数据时代,客户数据类别分布不平衡、高维以及大量的无类别标签样本等特征让这一问题变得更为复杂,成为一个复杂的系统性决策问题。为解决这一问题,本文提出基于综合集成研讨厅的半监督客户关键特征选择模型(semi-supervised key feature selection of customers based on hall for workshop of meta-synthetic engineering,SFS-HWME)。该模型邀请5位相关领域的专家确定研究难点并通过定性分析寻找备选方案,然后通过综合集成得到整体解决方案,进一步进行定量分析建模。在定量分析模型中,使用半监督学习(semi-supervised learning,SSL)技术,首先使用初始有类别标签的数据集L训练Adaboost集成模型来预测无类别标签数据集U中样本的类别;接着,使用自组织映射(self-organization map,SOM)算法对数据集U进行聚类并对其中的样本进行选择性标记;然后将这些样本连同标记的类别标签一起添加到数据集L中;最后,使用重抽样技术平衡新的训练集L的类别分布,再训练数据分组处理(group method of data handling,GMDH)深度学习网络选择最优特征子集,并邀请专家从特征子集中选出最合理的。在4个客户分类数据集上进行实证分析,结果表明,和已有的一些模型相比,本文提出的SFS-HWME模型具有更好的关键特征选择性能。

关键词: 综合集成研讨厅, 客户分类, 特征选择, 半监督学习, GMDH, 重抽样

Abstract: Customer classification has always been one of the most important issues in customer relationship management (CRM). Therefore, it is very important to select key features of customers. In the era of big data, unbalanced classification distribution, high dimension, and a large number of samples without label have made this more complex and become a complex systemic decision issue. In order to address this issue, this study proposes the semi-supervised key feature selection model of customers based on hall for workshop of meta-synthetic engineering (SFS-HWME). The model invites five experts in related fields to identify research difficulties, find alternatives through qualitative analysis, obtain a total solution through comprehensive integration and get a quantitative analysis model. The quantitative analysis model uses semi-supervised learning (SSL). Firstly, it uses the data set L with category tags to train the Adaboost integration model to predict the categories of samples in the data set U with unclassified tags; secondly, the data set U is clustered by the self-organization map (SOM) algorithm and the samples are selectively tagged; thirdly, these samples are added to the data set L along with the tagged category tags; finally, the re-sampling technique is used to balance the class distribution of the new training set L, and the group method of data handling (GMDH) deep learning network is trained to pick out the optimal feature subset. The research invites 5 experts to select the most reasonable features. The empirical analysis on four customer classification data sets shows that the proposed SFS-HWME model has better key feature selection performance than some existing models.

Key words: hall for workshop of meta-synthetic engineering, customer classification, feature selection, semi-supervised learning, GMDH, re-sampling