管理评论 ›› 2021, Vol. 33 ›› Issue (2): 176-186.

• 电子商务与信息管理 • 上一篇    下一篇

基于内容特征的评论效用排名预测——以豆瓣书评为例

聂卉   

  1. 中山大学资讯管理学院, 广州 510275
  • 收稿日期:2017-12-04 出版日期:2021-02-28 发布日期:2021-03-08
  • 作者简介:聂卉,中山大学资讯管理学院副教授,硕士生导师,博士。
  • 基金资助:
    国家社会科学基金项目(15BTQ067)。

Content-specific Ranking Prediction for Online Reviews——Case of Douban Book Reviews

Nie Hui   

  1. School of Information Management, Sun Yat-Sen University, Guangzhou 510275
  • Received:2017-12-04 Online:2021-02-28 Published:2021-03-08

摘要: 本文基于双路径模型理论,以书评为对象,深入探究表征评论内容信息量、结构、语言、论据及情感的特征变量对评论感知效用的影响,据此构建评论效用预测模型,实现基于内容的评论排名预测。研究分两个层面:解释层面,运用基于随机森林的特征优选算法,探析影响评论感知效用的重要文本特征;预测层面,采用树回归模型预测评论效用,实现基于内容的评论排名推荐。研究结果表明,对于长篇幅书评,内容蕴含的信息量、组织结构以及内容中主客观论据的呈现对提升评论感知效用预测精度有重要影响力;基于优选的内容特征,评论效用预测模型的解释力达78%,误差小于0.001;对于投票得分较高的评论,基于内容的效用预测排名与投票排名基本保持一致。这些结论验证了依据评论内容能够比较准确地预测评论的感知效用的判断,揭示出评论的感知效用与评论内容的密切关系。这一结论为网站进行评论质量控制和有效利用提供了依据及可行方案。

关键词: 在线评论, 预测, 评论效用, 文本挖掘

Abstract: In this paper, under the theory of dual-route model, the impact on book review helpfulness exerted by five aspects of its content, namely informativeness, structure, linguistic style, argument and subjectivity, is investigated. Thus, the significant content features can be confirmed and used for review helpfulness modeling. Two models are involved in the study. The interpretation one, built by employing a feature selection algorithm, is used for identifying the content features impacting on review helpfulness significantly; while the tree-based regression model is used for predicting review helpfulness and rank. For interpretation model, the research result indicates that informativeness, structure and argument related features are much more significantly related with review helpfulness. As for prediction model based the optimal features, its R2 achieves 78% and the error index MSE is less than 0.001. Specifically, the predictive rank is basically in line with vote based ranking for reviews with higher score helpfulness. Overall, all results indicate the helpfulness of a review can be predicted quite accurately according to its content only, which means the study contributes to find out feasible solutions for the review quality control and effective utilization.

Key words: online review, prediction, review helpfulness, text mining