Smart Article Type Text Classification with Machine Learning : Leveraging SentenceBERT and Linear SVC for Accurate Category Prediction and Serving the trained model thorugh an API Endpoint

This repository contains a machine learning project for classifying articles based on the text data into various categories such as 'Commercial', 'Military', 'Executives', 'Others', 'Support & Services', 'Financing', and 'Training'. The project utilizes text data and employs natural language processing (NLP) techniques and machine learning models to perform the classification.

Overview

The goal of this project is to build a text classification model that can accurately predict the type of an article based on its content. The project includes data preprocessing, model training, and deployment using Flask.

Project Structure

Article_type_classification/
│
├── Article_type_classification_final.ipynb  # Notebook for final model training and evaluation
├── README.md                                # This README file
├── app.py                                   # Flask application for serving the model
├── article_type_classifier_model.pkl        # Trained model
├── articles.csv                             # Dataset containing articles
├── class_names.pkl                          # Pickle file containing class names
├── preprocess.py                            # Script for data preprocessing
├── requirements.txt                         # Dependencies file
├── unknown_articles.csv                     # Dataset with unknown articles URLs for prediction
└── url_article_type_prediction.ipynb        # Notebook for URL based predictions

Dependencies

To install the required dependencies, use the following command:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

Installation

Clone the repository (in Colab):

git clone https://github.com/mcPython95/Article_type_classification.git
cd Article_type_classification

Training the Model

To train the model, use the Article_type_classification_final.ipynb notebook. This notebook contains the complete workflow for data preprocessing, model training, and evaluation.

Download files

Download the data files in the project directory:

articles.csv - This file contains the dataset of known articles used for training and evaluating the classification model. It includes features and labels necessary for model development.
unknown_articles.csv - This file contains a list of links to articles with unknown categories. These links will need to be accessed and the content extracted for classification by the trained model.

Save the model

article_type_classifier_model.pkl - This file contains the trained model for classifying article types. It is a pickled object of your model and can be loaded using pickle.load().
class_names.pkl - This file contains the list of class names used by the model. It is a pickled object that maps the class indices to human-readable names. It can be loaded using pickle.load().

Predicting Article Types

You can use the url_article_type_prediction.ipynb notebook to predict the types of articles from a list of URLs. Simply open the notebook and follow the instructions.

Usage

Running the Flask App (Use VScode or any code editor)

Download and place the necessary models and data files in the project directory:
- article_type_classifier_model.pkl
- class_names.pkl
Create a virtual environment:

python -m venv venv

Activate the virtual environment:
- On Windows:
```
venv\Scripts\activate
```
- On macOS/Linux:
```
source venv/bin/activate
```
Install the dependencies:

pip install -r requirements.txt
python -m spacy download en_core_web_sm

Start the Flask app:

python app.py

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Smart Article Type Text Classification with Machine Learning : Leveraging SentenceBERT and Linear SVC for Accurate Category Prediction and Serving the trained model thorugh an API Endpoint

Table of Contents

Overview

Project Structure

Dependencies

Installation

Training the Model

Download files

Save the model

Predicting Article Types

Usage

Running the Flask App (Use VScode or any code editor)

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Article_type_classification_final.ipynb		Article_type_classification_final.ipynb
README.md		README.md
app.py		app.py
article_type_classifier_model.pkl		article_type_classifier_model.pkl
articles.csv		articles.csv
class_names.pkl		class_names.pkl
preprocess.py		preprocess.py
requirements.txt		requirements.txt
unknown_articles.csv		unknown_articles.csv
url_article_type_prediction.ipynb		url_article_type_prediction.ipynb

mcPython95/Article_type_classification

Folders and files

Latest commit

History

Repository files navigation

Smart Article Type Text Classification with Machine Learning : Leveraging SentenceBERT and Linear SVC for Accurate Category Prediction and Serving the trained model thorugh an API Endpoint

Table of Contents

Overview

Project Structure

Dependencies

Installation

Training the Model

Download files

Save the model

Predicting Article Types

Usage

Running the Flask App (Use VScode or any code editor)

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages