Predicting-the-Propagation-Patterns-of-Health-Misinformation-Across-Social-Media-Platforms

Overview:
Health misinformation, especially during crises like COVID-19, significantly impacts public well-being. This project aims to answer the following research questions

How does misinformation cluster within communities on social media, and what does this reveal about echo chambers and information silos?
How do thematic clusters of hashtags correlate with the misinformation ratio, and what does this imply for targeted fact-checking and public health messaging?
How do retrieval-augmented transformer models compare with traditional approaches, fine-tuned transformer models and prompt-based large language models in terms of accuracy, robustness, and explainability for classifying health misinformation on social media?

What will you find in this repository?:
The data used is a combination of three datasets. COVIDLIES, HealthLies, and CONSTRAINT. We have implemented the following models:

Classical Machine Learning Models:

Logistic Regression: Linear classifier with L2 regularization

Naive Bayes: Probabilistic classifier based on Bayes' theorem

Random Forest: Ensemble of decision trees with bootstrap sampling

Support Vector Machine (SVM): Implementation with RBF kernel

Transformer-Based Models:

BERT Baseline: Pre-trained BERT-base model fine-tuned on our dataset

RoBERTa: Optimized BERT architecture with improved training methodology

RoBERTa + RAG: RoBERTa model enhanced with Retrieval Augmented Generation

Qwen 2.5 (7B): zero-shot instruction prompting for misinformation detection

Qwen 2.5 + RAG: Qwen 2.5 model augmented with domain-specific knowledge retrieval

In addition to the modelling, we have implemented Network Analysis, which includes the following:

Community Detection via Content Similarity
Hashtag Network Analysis

Group Members:
Nazeefa Muzammil
Md Mehedi Hasan Jibon
Alejandro Delgado Rios
Fahim Rahman

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Community Analysis		Community Analysis
Hashtag Network Analysis		Hashtag Network Analysis
RAG Pipeline		RAG Pipeline
baseline_checkpoints		baseline_checkpoints
baseline_model		baseline_model
bert_misinformation_model		bert_misinformation_model
global		global
llm_inference		llm_inference
optimized_misinfo_model		optimized_misinfo_model
output		output
outputs		outputs
.gitattributes		.gitattributes
.gitignore		.gitignore
Project.ipynb		Project.ipynb
README.md		README.md
Untitled.ipynb		Untitled.ipynb
merged_dataset.csv		merged_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting-the-Propagation-Patterns-of-Health-Misinformation-Across-Social-Media-Platforms

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

nazeefa16/Predicting-the-Propagation-Patterns-of-Health-Misinformation-Across-Social-Media-Platforms

Folders and files

Latest commit

History

Repository files navigation

Predicting-the-Propagation-Patterns-of-Health-Misinformation-Across-Social-Media-Platforms

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages