›› 2018, Vol. 30 ›› Issue (8): 126-137.

• 电子商务与信息管理 • 上一篇    下一篇

基于聚类方法的百度搜索指数关键词优化及客流量预测研究

张玲玲1,2, 张笑1,3, 崔怡雯1,2   

  1. 1. 中国科学院大学经济与管理学院, 北京 100190;
    2. 中国科学院大数据挖掘与知识管理重点实验室, 北京 100190;
    3. 中国科学院大学中丹中心, 北京 100190
  • 收稿日期:2016-01-13 出版日期:2018-08-28 发布日期:2018-08-31
  • 通讯作者: 崔怡雯(通讯作者),中国科学院大学经济与管理学院博士研究生
  • 作者简介:张玲玲,中国科学院大学经济与管理学院教授,博士生导师,博士;张笑,中国科学院大学经济与管理学院硕士研究生。
  • 基金资助:

    国家自然科学基金面上项目(71471169);国家自然科学基金重大研究计划重点支持项目(91546201)。

Forecasting Tourist Volume Based on Clustering Method with Screening Keywords of Search Engine Data

Zhang Lingling1,2, Zhang Xiao1,3, Cui Yiwen1,2   

  1. 1. School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190;
    2. Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190;
    3. Sino-Danish Center for Education and Research, University of Chinese Academy of Sciences, Beijing 100190
  • Received:2016-01-13 Online:2018-08-28 Published:2018-08-31

摘要:

旅游业作为各国非贸易外汇收入的主要来源之一,其客流量的预测是营销和运营的重要环节。但统计局公布数据的滞后性使得以往预测方法难以捕捉旅游市场的最新变化趋势。本文基于网络搜索数据及历史客流量数据构建模型,并探索其对旅游市场客流量的预测作用。通过聚类方法筛选关键词,选取与预测变量的波动趋势具有相关性的关键词合成关键词指数,使得搜索指数与旅游市场发展趋势之间的有效信息进行进一步的互补,再结合历史数据进行修正建立自回归滞后模型,相对于单一使用历史数据或搜索指数进行预测的方法,预测准确度有很大提升,可以为相关旅游企业部门提供客流量预测的新方法。

关键词: 网络搜索数据, 关键词指数, 聚类, 客流量预测

Abstract:

As one of the main sources of non-trade foreign exchange earnings, tourism industry has been developing rapidly, and its traffic forecasting is an important part of marketing and operations. However, the general prediction methods based on the Bureau of Statistics data, which are released with a lag, cannot reflect the latest trends of the tourism market. Therefore, the forecasting model based on search engine data and historical traffic statistics data is proposed in this paper, to explore the relationship between search engine data and tourism market forecast passenger traffic. Keywords related to the fluctuation of predictor variables are selected through clustering method to synthesize keywords indexes, so that the effective information between search data and tourism market trends can be further complementary. Then, historical statistics data and synthetic keyword indexes are used to establish autoregressive lag model. This model proves to be more accurate in comparison with the methods based merely on either search indexes or historical data. This paper provides a new method of forecasting tourist volume for tourism business management.

Key words: search engine data, keyword index, clustering method, forecasting tourist volume