收稿日期: 2020-06-28
修回日期: 2020-08-09
网络出版日期: 2020-12-01
基金资助
国家重点研发计划项目 ( 2017YFB0802300)
Approach for Spammer Detection in Weibo Based on Multi-View Fusion
Received date: 2020-06-28
Revised date: 2020-08-09
Online published: 2020-12-01
Supported by
Supported by the National Key Research and Development Program of China ( 2017YFB0802300)
为了更有效地检测微博垃圾用户,提出了一种新的基于多视图融合的方法。首 先,设计综合多视图信息的用户表征策略,分别构建用户行为、社交关系、微博内容 3 个视图对用户进行表征。针对现有方法未充分考虑用户粉丝及用户在社交网络中所处 环境的不足,引入粉丝比率、粉丝平均双向连接率、基于社区的双向连接率、基于社区 的集群系数等新特征。然后,构建基于线性加权函数的多视图融合决策模型,将来自各 视图的分类结果进行线性加权融合,并通过最小化近似误差求得最优融合系数,进而得 到最终的分类结果。在微博真实数据集上的测试结果表明,该方法能够有效检测垃圾用 户,精确率和 F1 值较现有方法有明显提高,且在应对不平衡数据时表现出了更强的稳 定性。文中还分析了不同视图对最终检测效果的影响,结果表明用户社交关系视图的作 用最显著。
杨晓晖 梁笑 . 基于多视图融合的微博垃圾用户检测方法[J]. 华南理工大学学报(自然科学版), 2020 , 48(12) : 125 -134 . DOI: 10.12141/j.issn.1000-565X.200366
In order to detect spammers more effectively in Weibo,an approach based on multi-view fusion was proposed. First,a user representation strategy for integrating multi-view information was designed to characterize users from 3 views,namely,user behavior,social relationship and text content. In view of the deficiencies that the existing approaches do not fully consider the user's fans and user's environment in social networks,new features such as fan ratio,fan average bidirectional connection rate,community-based bidirectional connection rate,communitybased cluster coefficient,etc. were introduced. Then,a multi-view fusion decision model based on a linear weighting function was constructed. A linear weighting fusion was carried out based on the classification results from each view. The optimal fusion coefficient was obtained by minimizing the approximate error,and then the final classification result was obtained. The test result on the real data from Weibo show that this approach can not only effectively detect spammers,with significant improvement in precision and F1-sorce,but also exhibits greater stability especially when processing unbalanced data. It also analyzes the impact of different views on the final detection effect, and the results show that the user's social relationship view has the most significant effect.
Key words: Weibo; spammer detection; linear weighting function; multi-view fusion
/
| 〈 |
|
〉 |