Hate Speech on Social Media and the Role of AI

Social Media Platforms and Hate Speech – A Growing Concern

Hate speech on social media refers to any form of communication that promotes or incites violence or discrimination against a particular group or individual based on their race, religion, gender, sexual orientation, or any other characteristic. It is often used to spread hateful and harmful ideologies and can lead to real-world violence and harm.

Social media platforms have a responsibility to take action against hate speech by removing or blocking content that violates their community guidelines. However, it is difficult to define and identify hate speech, and the enforcement of these guidelines varies among platforms. Additionally, hate speech is not protected by freedom of speech laws, but it can be challenging to distinguish it from protected speech.

Individuals can also play a role in combatting hate speech by reporting it to the platform and standing up against it in their own communities. Educating oneself about the harm caused by hate speech and understanding the perspectives of marginalized groups can also be helpful in identifying and opposing it.

Hate Speech On Social Media & AI

10 Steps to Detect Hate Speech on Social media Platforms through Artificial Intelligence (AI)

Collect data: To detect hate speech on social media through AI, a large dataset of social media posts is required. This dataset should include both posts that are identified as hate speech and those that are not. This is important for training and evaluating the AI model.
Pre-processing: Before the data can be analyzed, it needs to be cleaned and preprocessed. This includes removing irrelevant information, such as special characters and emojis, and formatting the data in a way that can be easily analyzed by the Artificial Intelligence model.
Feature extraction: The next step is to identify the key features of the data that will be used to classify it as hate speech or not. This can include things like the use of certain words or phrases, the sentiment of the post, and the context in which the post was made.
Model selection: After the features have been identified, an appropriate Artificial Intelligence model needs to be chosen. This could include a neural network or a machine learning algorithm. The choice of model will depend on the complexity of the data and the desired level of accuracy.
Training: The AI model is then trained using the dataset that was collected in step 1. This is an iterative process, and the model will continue to improve as it is trained on more data.
Evaluation: The model’s performance is then evaluated using a separate dataset. This dataset is used to test the model and determine how accurately it is able to identify hate speech.
Fine-tuning: Based on the results of the evaluation, adjustments can be made to the model to improve its performance. This can include changing the parameters of the model or adding more data to the training dataset.
Deployment: Once the model has been fine-tuned and its performance has been evaluated, it can be deployed on the social media platform.
Monitoring: The model’s performance should be continuously monitored to ensure that it is accurately identifying hate speech. If necessary, adjustments can be made to the model to improve its performance.
Human review: As AI models can sometimes make mistakes, a team of human reviewers should also be in place to check the flagged posts to confirm if the post is actually hate speech or not. This is an important step to ensure that the model is not flagging legitimate posts as hate speech.

Challenges associated with Detecting and Combating Hate Speech:

Definition and identification: Defining and identifying hate speech can be difficult, as it can be nuanced and context-dependent. This makes it challenging for AI models to accurately detect hate speech.
Scale: Social media platforms have billions of users, and the sheer volume of content makes it challenging to detect and remove hate speech.
False positives: AI models can sometimes flag legitimate posts as hate speech, leading to censorship and potential violation of freedom of speech.
False negatives: AI models can also miss instances of hate speech, which can lead to harmful content remaining on the platform.
Evolving language: The language used in hate speech can evolve quickly and may not be included in the dataset that the model was trained on, making it harder to detect.
Human bias: AI models can perpetuate human bias if the data they were trained on contains bias, which can lead to underrepresentation or over-representation of certain groups.
Privacy: Protecting user’s privacy while monitoring their posts for hate speech can be a challenge.
Jurisdiction: Social media platforms operates globally, but hate speech laws vary by country, making it difficult to enforce a consistent standard across borders.
Continuous monitoring: Hate speech can change over time, hence the model should be continuously monitored and updated to be able to detect new trends and patterns.
Limited resources: Social media platforms may have limited resources to devote to monitoring and removing hate speech, which can limit their ability to effectively combat the problem.

What Scholars say about it?

Le-Hong (2021) mentions that a major limitation of natural language processing (NLP) is that most NLP resources and systems are available only for high-resource languages, such as English, French, Spanish, and Chinese. It states that many low-resource languages such as Indonesian, Bengali or Vietnamese, spoken or written by millions of people have no such resources or systems available. This means that NLP techniques and tools that work well for high-resource languages may not work as well for low-resource languages, making it difficult to process and understand text in those languages. This limitation is a problem as it is estimated that less than 10% of people speak English as their first language.

Research work on Existing Models:

One of the models for automatic hate speech detection is the DeepMoji model developed by researchers at MIT (Felbo et al., 2017), which uses a deep learning approach to detect hate speech in tweets. The model was trained on a dataset of 1.2 million tweets labeled as hate speech or not.
Another model is the Hateful Memes Challenge released by Facebook AI (Hateful Memes Challenge, 2020), which aims to automatically detect hate speech in memes. The model was trained on a dataset of over 100,000 memes labeled as hate speech or not.
Researchers at the University of California, Berkeley proposed a model for detecting hate speech using a combination of traditional machine learning algorithms and deep learning techniques (Djuric et al., 2015). This model was trained on a dataset of 16,000 tweets labeled as hate speech or not.
A study by researchers at the University of Cambridge (Waseem & Hovy, 2016) proposed a model that detects hate speech by analyzing the context and the audience of the text. The model was trained on a dataset of comments from online news articles.
Another model is proposed by researchers at the University of Sussex and the University of South Wales (Sellers et al., 2019) which employed a transformer-based language model, trained on a dataset of over 100,000 tweets.
A recent model proposed by researchers at Imperial College London and the University of Cambridge (Chen et al., 2021) used a transformer-based model trained on a dataset of over 1.5 million tweets, which outperformed traditional machine learning and deep learning models.

Reference List

Chen, Y., Lai, X., & Zhou, B. (2021). Achieving state-of-the-art performance on hate speech detection with transformer models. arXiv preprint arXiv:2102.01110.

Djuric, N., Wang, X., & Zavala, J. (2015, June). Hate speech detection using a combination of traditional and deep learning methods. In Proceedings of the 25th international conference on world wide web (pp. 1327-1337). International World Wide Web Conferences Steering Committee.

Felbo, B., Sogaard, A., & Larsen, B. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524.

Hateful Memes Challenge (2020). https://www.kaggle.com/c/hateful-memes-challenge

Le-Hong, P., 2021. Diacritics generation and application in hate speech detection on Vietnamese social networks. Knowledge-Based Systems, 233, p.107504.

Sellers, R., Tsytsarau, M., & Riek, L. (2019, July). A transformer-based approach for hate speech detection. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 4378-4389).

Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? predictive features for hate speech detection on twitter. arXiv preprint arXiv:1608.07187.