Fine-tuned BERT-base-uncased pre-trained model to classify spam SMS.

My second project in Natural Language Processing (NLP), where I fine-tuned a bert-base-uncased model to classify spam SMS. This is huge improvements from https://github.com/fzn0x/bert-indonesian-english-hate-comments.

How to use this model?

from transformers import BertTokenizer, BertForSequenceClassification
import torch

tokenizer = BertTokenizer.from_pretrained('fzn0x/bert-spam-classification-model')
model = BertForSequenceClassification.from_pretrained('fzn0x/bert-spam-classification-model')

Check scripts/predict.py for full example (You just need to modify the argument of from_pretrained).

✅ Install requirements

Install required dependencies

pip install --upgrade pip
pip install -r requirements.txt

✅ Add BERT virtual env

write the command below

# ✅ Create and activate a virtual environment
python -m venv bert-env
source bert-env/bin/activate    # On Windows use: bert-env\Scripts\activate

✅ Install CUDA

Check if your GPU supports CUDA:

nvidia-smi

Then:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:False

🔧 How to use

Check your device and CUDA availability:

python check_device.py

⚠️ Using CPU is not advisable, prefer check your CUDA availability.

Train the model:

python scripts/train.py

⚠️ Remove unneeded checkpoint in models/pretrained to save your storage after training

Run prediction:

python scripts/predict.py

✅ Dataset Location: data/spam.csv, modify the dataset to enhance the model based on your needs.

📚 Citations

If you use this repository or its ideas, please cite the following:

See citations.bib for full BibTeX entries.

Wolf et al., Transformers: State-of-the-Art Natural Language Processing, EMNLP 2020. ACL Anthology
Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 2011.
Almeida & Gómez Hidalgo, SMS Spam Collection v.1, UCI Machine Learning Repository (2011). Kaggle Link

🧠 Credits and Libraries Used

Hugging Face Transformers – model, tokenizer, and training utilities
scikit-learn – metrics and preprocessing
Logging silencing inspired by Hugging Face GitHub discussions
Dataset from UCI SMS Spam Collection
Inspiration from Kaggle Notebook by Suyash Khare

License and Usage

License under MIT license.

Leave a ⭐ if you think this project is helpful, contributions are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
logs		logs
models		models
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check_device.py		check_device.py
citations.bib		citations.bib
publish.py		publish.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-tuned BERT-base-uncased pre-trained model to classify spam SMS.

How to use this model?

✅ Install requirements

✅ Add BERT virtual env

✅ Install CUDA

🔧 How to use

📚 Citations

🧠 Credits and Libraries Used

License and Usage

About

Languages

License

fzn0x/bert-sms-classification

Folders and files

Latest commit

History

Repository files navigation

Fine-tuned BERT-base-uncased pre-trained model to classify spam SMS.

How to use this model?

✅ Install requirements

✅ Add BERT virtual env

✅ Install CUDA

🔧 How to use

📚 Citations

🧠 Credits and Libraries Used

License and Usage

About

Resources

License

Stars

Watchers

Forks

Languages