Skip to content

SocialHaterBERT: a dichotomous approach for automatically detecting hate speech on Twitter through textual analysis and user profiles

Notifications You must be signed in to change notification settings

glorelvalle/socialhaterbert

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SocialHaterBert: a dichotomous approach for automatically detecting hate speech on Twitter through textual analysis and user profiles.

This work is in progress.

drawing

This project comprises both an in-depth study of the efforts and techniques used so far for the detection and prevention of hateful content and cyberbullying on the popular social network Twitter, as well as a proposal for a novel approach for feature analysis based on user profiles, related social environment and generated tweets. It has been found that the contribution of user characteristics plays a significant part on gaining a deeper understanding of hate virality in the network, so that this work has allowed to open the field of study and break the textual boundaries giving rise to future research in combined models from a diachronic and dynamic perspective.

🗎 SocialHaterBERT paper (ESWA)

👀 SocialHaterBERT Project webpage

✉️ Request datasets

In an already polarized world, social networks are a double-edged sword with the appearance of phenomena such as hate speech. In the present work, its presence on Twitter has been detected and analyzed. For this, a base algorithm, HaterBERT, has been designed, which improves current Spanish classifiers’ results by 3%-27%.

Furthermore, the presence of hate speech on Twitter has been analyzed through an extensive study that has served to extrapolate essential characteristics of it. To do this, a procedure has been developed for the extraction and manipulation of these characteristics, SocialGraph, which has been demonstrated with an F1 of 99 % and a Random Forest classifier that provides valuable data for the identification of hater profiles.

These findings lead to the development of SocialHaterBERT, a novel multimodal model that combines categorical and numerical variables from the social network with text input from tweets, providing not only a new way to understand hate speech on social media in general but also demonstrating how the context of social media improves textual classification, which is the most valuable contribution of this paper. In particular, we achieved a 4% improvement over the HaterBERT’s base algorithm and a 19% improvement over our original algorithm, HaterNet (Pereira-Kohatsu et al., 2019). Future research should look into aspects such as a review of hate’s history and evolution on the network, trends, public and anonymous users affected by it, and aggressors’ profiles, with the goal of encouraging the discovery of relationships with the dissemination and virality of hate on social networks.

Following that, interactions with one another might be investigated, resulting in an extension of SocialGraph’s characteristics and a prediction of each tweet’s virality.

_____________________________________________ 💻 _______________________________________________

Brief Methodology and Design

This section introduces the design of the three approaches created for hate speech on Twitter.

drawing

This is our base model, based on BERT, for textual hate or no hate classification. The following are the modifications made to the transformer and the tools used to make them:

  • Transformers Libraries: Huggingface 🤗, which has NLP tools and pre-trained transformers: BERT and its Spanish version BETO.
  • BERT Fine Tuning library: DE-LIMIT.

To get HaterBERT to feed on the characteristics of the social network, it is first necessary to get all the relative information. To do this, given a dataset D consisting of tweets that may or may not contain hate, we first collect:

  1. Information related to each tweet (i.e text, author, number of retweets, responses, etc). The collected fields can be seen in the table below.
Attribute Type Description
user_id int user identifier
screen_name str username
tweet_id int tweet identifier
tweet_text str tweet text
tweet_creation_at datetime tweet creation date
n_favs int number of favorites
n_rts int number of retweets
is_rt boolean the tweet is a retweet
rt_id_user int id of the retweeted user
rt_id_status int id of the retweeted tweet
rt_text str text of the retweeted tweet
rt_creation_at datetime creation date of the retweeted tweet
rt_fav_count int number of favorites (if is retweeted)
rt_rt_count int number of retweets (if is retweeted)
is_reply boolean the tweet is a response
reply_id_status int id of the tweet being replied
reply_id_user int user id to which it responds
is_quote boolean the tweet is a quote from another
quote_id_status int id of the quoted tweet
quote_id_user int id of the quoted user
quote_text str text of the quoted tweet
quote_creation_at datetime creation date of the quoted tweet
quote_fav_count int number of favorites quoted
quote_rt_count int number of retweets quoted
  1. Information regarding the users who authored each tweet (i.e. username, biography, url of the profile image, number of user tweets, number of followers, etc.). The collected fields can be seen in the table below. With this we intend to broaden the analysis by modeling the user who has posted each tweet.
Attribute Type Description
user_id int user id
uname str user profile name
virtual boolean virtual node
screen_name str username
description str biography or description
location str location if any
verified boolean verified account
profile_image_url str profile picture url
default_profile boolean profile update
default_image_profile boolean profile picture update
geo_enabled boolean real location enabled
created_at datetime account creation date
statuses_count int number of user tweets
listed_count int number of lists
followers_count int number of followers
followees_count int number of followed
favorites_count int number of favorites
  1. Each user’s last 200 tweets, complemented with the information from point

    1. This allows us to model the types of contributions that each user makes on a regular basis.
  2. The user profiles mentioned or retweeted by each author in those 200 tweets, so that we can learn about their environment.

All this information is the base on which the attributes of SocialGraph are built. Below we describe its construction process.

Constructing the Graph and Calculating Centrality Measures

Using a Neo4j database we build a graph with three types of nodes:

  • User: node that collects all of the user’s information.
  • Tweet: node that collects all the information related to tweets.
  • Multimedia: node that collects the url referring to the multimedia content or link (to news) that is shared within a tweet.

And three types of links between them: Quoted, Retweeted or Shared. We then proceed to compute centrality measures in the graph:

Measure Description
betweenness computes the shortest path to the graph’s centrality
eigenvector measure of a node’s influence on the network
in-degree number of edges pointing to node
out-degree number of edges pointing outside the node
clustering fraction of pairs of neighboring nodes adjacent to each other
degree number of edges adjacent to the node
closeness average distance of all reachable nodes to node

Given that, centrality measures have showed to be effective at quantifying the relative importance of actors in a social network. For example, a node’s ability to influence others is affected much more by its strategic placement within a social network than by the number of followers it has.

Summary statistics

We analyze the information downloaded through Twitter’s API and infer a series of new characteristics in order to get a better overall picture of each user. These characteristics are obtained via:

  • Counting: In this case, we only perform basic statistical operations on the total number of tweets downloaded per user (e.g., the number of times the user’s tweets are retweeted, the number of bad words per tweet, the average number of tweets per day, the number of hashtags used, the number of user errors, etc.).
Attribute Type Description
status_retrieving int number of saved tweets
status_start_day datetime start date of tweet extraction
status_end_day datetime end date of tweet extraction
status_average_tweets_per_day float average tweets per day
activity_hourly_X int number of tweets at each day hour, 24 attributes being X2[00-23]
activity_weekly_X int number of tweets at each week day, 7 attributes being X2[0-6]
rt_count int total number of saved tweets
geo_enabled_tweet_count int number of tweets with geolocation enabled
num_hashtags int number of hashtags used
num_mentions int number of mentions
num_urls int number of domains shared by the user
baddies list(str) bad words or insults used by the user
n_baddies int number of baddies
n_baddies_tweet float number of baddies per tweet
len_status float average tweet length
times_user_quotes int number of times other users are quoted
num_rts_to_tweets int number of times user tweets are retweeted
num_favs_to_tweets int number of times user tweets are favorite
leet_counter int number of times the user uses the leet alphabet
  • Clustering: where we group the analyzed content and extract the most relevant clusters (i.e top 6 of most shared domains, top 10 of most enabled places, top 5 of most retweeted users, etc.).
Attribute Type Description
top_languages dict(language(str), account(int)) top 5 languages most used by the user by number of tweets
top_sources dict(vía(str), account(int)) top 5 ways to tweet by number of tweets
top_places dict(place(str), account(int)) top 10 places most enabled by the user by number of tweets
top_hashtags dict(hashtag(str), account(int)) top 10 hashtags most used by the user by number of tweets
top_retweeted_users dict(user(str), account(int)) top 5 most retweeted users by the user by number of tweets
top_mentioned_users dict(user(str), account(int)) top 5 users most mentioned by the user by number of tweets
top_referenced_domains dict(domain(str), account(int)) top 6 domains most shared by the user by number of tweets
  • Modeling: attributes such as the number of negative, positive, or neutral tweets, the categories to which the image of the user profile belongs, the top 15 topics of each user, and so on are inferred using ad hoc designed classifiers.
Attribute Type Classifier Source Description
categories_profile_image_url dict(dict(category, score, hierarchy=None)) Client Watson Visual Recognition (IBM) VisualRecognitionV3 user’s profile image categories
negatives positives neutral int Sentiment analysis classifier (transformers) finiteautomata/betosentiment-analysis number of negatives number of positives number of neutral
negatives_score positives_score neutral_score float Sentiment analysis classifier (transformers) finiteautomata/betosentiment-analysis negatives score positives scoreneutral score
hate non_hate int Ad hoc classifier HaterBERT number of hate tweets number of non hate tweets
hate_score non_hate_score float Ad hoc classifier HaterBERT hate score non hate score
top_categories dict(category(str), account(int)) Spanish Category classifier (Python Library) subject_classification_spanish top 15 tweet categories
misspelling_counter int Spanish Spell checker pyspellchecker number of errata committed by the user

Transforming and Coding

To be part of the input of any model we must transform the set of characteristics into a set of attributes. Each of the characteristics’ tables indicates the type of variable associated with each characteristic, these can be grouped into.

Variable Original Variable(s) Group Method Categories Description
verified NC profile boolean classification 0: No, 1: Yes user is verified
hater NC activity boolean classification 0: No, 1: Yes user has more than 5% hate tweets
vecino_hater NC activity boolean classification 0: No, 1: Yes the user has at least one neighbor with more than 5% hate tweets
profile_changed default_profile profile boolean classification 0: No, 1: Yes the user ever updated his profile
clase_NER screen_name + uname profile NER tag search (Spacy) 0: PER, 1: MISC, 2: ORG, 3: UND tipo de nombre
clase_DESCR description profile cleaning (NLTK) + Topic Modeling (Gensim) 0: opinion, 1: studies, 2: politics, 3: activities description type
clase_LOC location profile cleaning + ad hoc dict + pycountry 0-19: geographic world areas or provinces in the case of Spain geographical area enabled by the user
clase_FECHA created_at profile division into three regions 0: < 2015, 1: [2015-2019], 2: > 2019 time of user creation
clase_IMG categories _profile_image _url profile Topic Modeling (Gensim) 0: people, 1: clothing, 2: building, 3: animal, 4: nature, 5: technology, 6: sports, 7: objects, 8: food profile image type
clase_HASHTAGS top_hashtags activity Correlation matrix + Topic Modeling 0: politics, 1: press, 2: sports, 3: others hashtag type
clase_CATS top_categories activity Topic Modeling (Gensim) 0: Spain, 1: culture, 2: art, 3: society 4: cartoons, 5: Catalonia, 6: graphical arts, 7: drawings, 8: opinion, 9: illustrations, 10: politics, 11: others most repeated categories by the user in tweets
clase_DOMS top_referenced _domains activity wikipedia + Topic Modeling 0: social networks, 1: information, communication and news, 2: entertainment type of domain most shared by the user
clase_RTSCAT top_retweeted _users activity Topic Modeling (Gensim) 0: Spain, 1: culture, 2: art, 3: society 4: cartoons, 5: Catalonia, 6: graphical arts, 7: drawings, 8: opinion, 9: illustrations, 10: politics, 11: others most retweeted user type
clase_MENCAT top_mentioned _users activity Topic Modeling (Gensim) 0: Spain, 1: culture, 2: art, 3: society 4: cartoons, 5: Catalonia, 6: graphical arts, 7: drawings, 8: opinion, 9: illustrations, 10: politics, 11: others most mentioned user type

Detail of the categorical variables in SocialGraph NC = does not change.

Variable Original Variable(s) Group Method Description
n_LESP top_languages activity Ad hoc function percentage of hate tweets in Spanish
n_LENG top_languages activity Ad hoc function percentage of hate tweets in English
n_LOTR top_languages activity Ad hoc function percentage of hate tweets in other language (no Spanish or English)
activity_hourly_X NC activity Ad hoc function percentage of tweets per hour (X=24)
activity_weekly_X NC activity Ad hoc function percentage of tweets per week day (X=7)
negatives NC activity Ad hoc function negative connotation percentage of tweets
positives NC activity Ad hoc function positive connotation percentage of tweets
neutral NC activity Ad hoc function neutral connotation percentage of tweets
n_hate NC activity Ad hoc function hate tweets percentage
n_nohate NC activity Ad hoc function non hate tweets percentage
n_baddies NC activity Ad hoc function percentage of baddies per tweet
eigenvector NC centrality - eigenvector score
in_degree NC centrality - in degree score
out_degree NC centrality - out degree score
degree NC centrality - degree score
clustering NC centrality - clustering score
closeness NC centrality - closeness score
betweenness NC centrality StandardScaler number of shortest paths to it
status_average_tweets _per_day NC activity StandardScaler average number of times user tweets per day
times_user_quotes NC activity StandardScaler number of times user quotes others
negatives_score NC activity - mean score of negative tweets
positives_score NC activity - mean score of positive tweets
neutral_score NC activity - mean score of neutral tweets
hate_score NC activity - score media de tweets de odio
no_hate_score NC activity - score media de tweets de no odio
statuses_count NC activity StandardScaler total number of tweets
followers_count NC activity StandardScaler total number of tweets followers
followees_count NC activity StandardScaler total number of tweets followees
favorites_count NC activity StandardScaler total number of tweets favourites
listed_count NC activity StandardScaler number of lists user is on
num_hashtags NC activity StandardScaler number of hashtags used
rt_count NC activity StandardScaler total number of retweets
num_mentions NC activity StandardScaler number of mentions made
num_urls NC activity StandardScaler number of shared urls
len_status NC activity StandardScaler average tweet length
num_rts_to_tweets NC activity StandardScaler number of times user tweets are retweeted
num_favs_to_tweets NC activity StandardScaler number of times user tweets are favourited
misspelling_counter NC activity StandardScaler number of times user makes mistakes or errors
leet_counter NC activity StandardScaler number of times user uses leet alphabet

Detail of the numerical variables in SocialGraph NC = does not change.

In order to improve on previous algorithms that only used the text of the tweet to be analyzed as input, SocialHaterBERT is created as a multimodal model that combines textual classifiers with social network characteristics. As a result, HaterBERT’s classifier after experimental optimization of its parameters and SocialGraph after an experimental attribute selection form the foundation of SocialHaterBERT.

For the construction of the model, we make use of the Multimodal Transformers library, which is used to incorporate multimodal data on text data for classification and regression tasks. In this way, a pre-trained transformer along the combination module’s parameters and the transformer are trained as a supervised task.

drawing

SocialHaterBERT’s architecture is as follows: to distribute the data for classification, the text, numeric, categorical and prediction columns are specified in a dictionary. After this, BertTokenizer and BertForSequenceClassification are instantiated respectively, which also allows the Fine-Tuning of it. Then, in the Combining Module (shown in figure above) a hidden two-layer MLP is created with a ReLu activation function, as it improves training. Finally, before the output layer results are combined using the logical sum of the attributes, as it proved to be the best combination option.

_____________________________________________ 💻 _______________________________________________

About

SocialHaterBERT: a dichotomous approach for automatically detecting hate speech on Twitter through textual analysis and user profiles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published