Jun 2018     Issue 7
Research
Detect Rumours Using Time Series of Social Context Information on Microblogging Websites

Prof. WONG Kam Fai, Department of Systems Engineering and Engineering Management

Social psychology literature defines a rumour as a statement whose truth value is unverifiable or deliberately false. These rumours on microblogging websites, carrying unreal or even malicious information, can cause massive panic and social unrest to our community. For instance, on April 23, 2013, a rumour on Twitter about two explosions in the White House injuring Barack Obama caused a temporary free-fall in the US stock market. Therefore, an automatic rumour detection technique that can quickly identify rumours and dynamically monitor the propagation of rumours becomes very useful.

To establish the veracity of any rumour, individuals and organisations have relied on common sense and investigative journalism. However, some websites are not comprehensive in their topical coverage and can also have a long debunking delay. Existing rumour detection methods using learning algorithms typically exploit supervised machine learning models based on a wide range of features corresponding to users, contents of messages and their propagation patterns. An obvious limitation of these models is that they just consider the overall statistics on the social context information of messages as features, e.g. the total number of retweets, the time length of propagation etc., and ignore the variation of these features over time. To improve the accuracy of detection, we argue that it is of importance not only to look at the overall properties and the properties of individual messages, but also to study the changes or the trends of these properties along the lifecycle of the concerned hypothesis.

By doing so, we exemplify two events from our dataset, a rumour and a non-rumour. Figure 1 show the variation of proportion of tweets using question marks and first person pronouns using time series. We observe that the non-rumour tends to use less question marks than the rumour does at the later stage. And there might be more frequent use of the first-person pronoun in the rumour at the early stage.

In order to capture these temporal traits, we propose a novel time series model called Dynamic Series -Time Structure (DSTS), by which the time series modelling technique is applied to capture the variation of a wide spectrum of social context information over time far more than the tweet volume feature.

Problem statement:
In this work, we model microblog data as a set of events, and each event consists of relevant microblogs. We represent each event as a vector containing social context features regarding the contents, users and diffusion patterns of the relevant microblogs. We convert the continuous time stream of microblogs associated with each event into fixed time intervals. For learning our model, we extract a rich set of features sensitive to time, where not only the overall statistics of social context information but also the variation of individual features based on the time intervals can be captured.
In this way, we study how well the time series of social context features can capture the variation of these features during the spread of event messages, which is supposed to benefit the differentiation between rumours and non-rumours.

Our model:
Figure1 shows the architecture of our proposed method which includes the following two steps, we firstly introduce an approach to discretise time stream for generating time stamps, then a method for capturing the variation of features.

 

Step1: Time Stamps Generation
For an event E_i, let timeFirst_i and timeLast_i (not sure??) be the time when the initial and the last microblog is posted, respectively. We convert the creating time of each microblog to a time interval falling into the range from 0 to N, serving as the time stamp of the current microblog, where N is the tuneable number of time intervals.

Step2: Dynamic Series-Time Structure
With all the time stamps of an event, a vector of its social context features can be naturally generated given each time stamp. However, the temporal properties of such information is subject to continuous change over time, which cannot be captured effectively by just modelling features within individual time intervals. A better approach would be to identify the shapes of time series, which are formed by the relative change between the consecutive intervals, as a supplement of the absolute temporal properties.

For this purpose, we propose a Dynamic Series -Time Structure (DSTS), which is used to capture the variation of each feature. In this structure, we not only consider the absolute feature values from the initial time up to each interval, but also incorporate the slopes of features between two consecutive intervals.

Experimental Results:
We utilise two datasets containing hundreds of events trawled from Twitter and Sina Weibo which are the most popular microblogging websites in English and Chinese, respectively. We build classifiers using the DSTS-based features and the annotated datasets. Experimental results demonstrate that our DSTS-based model achieves promising improvements over the state-of-the-art approaches on both datasets.

 

  
		Figure 1: The two sample events (a) question mark
Figure 1: The two sample events (a) question mark
  
		Figure 1: The two sample events (b) First-person pronoun
Figure 1: The two sample events (b) First-person pronoun
Past Issue      
Contact Us
Subscribe    Email to friend    Unsubscribe
Copyright © 2024.
All Rights Reserved. The Chinese University of Hong Kong.