Sentiment Analysis on Bias in Reviews

This experiment investigates potential bias in several machine learning models applied to sentiment analysis of patient reviews. The models used include zero-R, $k$-nearest neighbor (KNN), naive Bayes (NB), logistic regression (LR), and multilayer perceptron (MLP). The focus is on the models' performance across male and female groups, with additional analysis under balanced data conditions.

Analysis

Preprocessing only: In the initial phase, the models struggle with inherent biases present in the raw data, primarily due to class imbalance, where positive comments dominate.
Balancing genders: This phase addresses gender imbalance in the data. While gender balancing improves fairness between male and female groups, it does not resolve the class imbalance issue.
Balancing classes: In the final phase, the class distribution between positive and negative reviews is balanced. This significantly enhances model performance, particularly for negative sentiment classification. The analysis concludes that the main source of bias in the model is the overrepresentation of positive comments, not gender.

Repository Structure

.
├── README.md
├── data
│   ├── README.txt
│   ├── TFIDF_TRAIN.csv       # training data in TFIDF representation
│   ├── TFIDF_VALIDATION.csv  # validation data in TFIDF representation
│   ├── TRAIN.csv             # raw training data with labels
│   └── VALIDATION.csv        # raw validation data with labels
├── main.ipynb
└── requirements.txt

The validation set is used as a test set in this project because the original test set is unlabeled.

Usage

It is recommended to run the code for this project within a development container in VS Code.

git clone https://github.com/wille-wang/ml-sentiment-bias.git
cd ml-sentiment-bias
code .
devcontainer open .  # Install `ms-vscode-remote.remote-containers` in VS Code first

Background

This project is modified from Assignment 3 of Introduction to Machine Learning (COMP90049) at the University of Melbourne, Semester 2, 2023.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.devcontainer		.devcontainer
.github		.github
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis on Bias in Reviews

Analysis

Repository Structure

Usage

Background

About

Languages

mr-yifeiwang/ml-sentiment-bias

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis on Bias in Reviews

Analysis

Repository Structure

Usage

Background

About

Topics

Resources

Stars

Watchers

Forks

Languages