华南理工大学学报(自然科学版) ›› 2020, Vol. 48 ›› Issue (12): 125-134.doi: 10.12141/j.issn.1000-565X.200366

• 人工智能专题 • 上一篇    下一篇

基于多视图融合的微博垃圾用户检测方法

杨晓晖 梁笑   

  1. 河北大学 网络空间安全与计算机学院,河北 保定 071000
  • 收稿日期:2020-06-28 修回日期:2020-08-09 出版日期:2020-12-25 发布日期:2020-12-01
  • 通信作者: 杨 晓 晖 ( 1975-) ,男,博 士,教 授,主要从事分布计算、信息安全与可信计算研究。 E-mail:yxh@hbu.edu.cn
  • 作者简介:杨 晓 晖 ( 1975-) ,男,博 士,教 授,主要从事分布计算、信息安全与可信计算研究。
  • 基金资助:

    国家重点研发计划项目 ( 2017YFB0802300)

Approach for Spammer Detection in Weibo Based on Multi-View Fusion

YANG Xiaohui LIANG Xiao   

  1. School of Cyber Security and Computer,Hebei University,Baoding 071000,Hebei,China
  • Received:2020-06-28 Revised:2020-08-09 Online:2020-12-25 Published:2020-12-01
  • Contact: 杨 晓 晖 ( 1975-) ,男,博 士,教 授,主要从事分布计算、信息安全与可信计算研究。 E-mail:yxh@hbu.edu.cn
  • About author:杨 晓 晖 ( 1975-) ,男,博 士,教 授,主要从事分布计算、信息安全与可信计算研究。
  • Supported by:

    Supported by the National Key Research and Development Program of China ( 2017YFB0802300)

摘要:

为了更有效地检测微博垃圾用户,提出了一种新的基于多视图融合的方法。首 先,设计综合多视图信息的用户表征策略,分别构建用户行为、社交关系、微博内容 3 个视图对用户进行表征。针对现有方法未充分考虑用户粉丝及用户在社交网络中所处 环境的不足,引入粉丝比率、粉丝平均双向连接率、基于社区的双向连接率、基于社区 的集群系数等新特征。然后,构建基于线性加权函数的多视图融合决策模型,将来自各 视图的分类结果进行线性加权融合,并通过最小化近似误差求得最优融合系数,进而得 到最终的分类结果。在微博真实数据集上的测试结果表明,该方法能够有效检测垃圾用 户,精确率和 F1 值较现有方法有明显提高,且在应对不平衡数据时表现出了更强的稳 定性。文中还分析了不同视图对最终检测效果的影响,结果表明用户社交关系视图的作 用最显著。

关键词: 微博, 垃圾用户检测, 线性加权函数, 多视图融合

Abstract:

In order to detect spammers more effectively in Weibo,an approach based on multi-view fusion was proposed. First,a user representation strategy for integrating multi-view information was designed to characterize users from 3 views,namely,user behavior,social relationship and text content. In view of the deficiencies that the existing approaches do not fully consider the user's fans and user's environment in social networks,new features such as fan ratio,fan average bidirectional connection rate,community-based bidirectional connection rate,communitybased cluster coefficient,etc. were introduced. Then,a multi-view fusion decision model based on a linear weighting function was constructed. A linear weighting fusion was carried out based on the classification results from each view. The optimal fusion coefficient was obtained by minimizing the approximate error,and then the final classification result was obtained. The test result on the real data from Weibo show that this approach can not only effectively detect spammers,with significant improvement in precision and F1-sorce,but also exhibits greater stability especially when processing unbalanced data. It also analyzes the impact of different views on the final detection effect, and the results show that the user's social relationship view has the most significant effect.

Key words: Weibo, spammer detection, linear weighting function, multi-view fusion

中图分类号: