Skip to content

Latest commit

 

History

History
30 lines (23 loc) · 1.72 KB

README.md

File metadata and controls

30 lines (23 loc) · 1.72 KB

Multilabel classification on Stack Overflow tags

Predict tags for posts from StackOverflow with multilabel classification approach.

Dataset

  • Dataset of post titles from StackOverflow

Transforming text to a vector

  • Transformed text data to numeric vectors using bag-of-words and TF-IDF.

MultiLabel classifier

MultiLabelBinarizer to transform labels in a binary form and the prediction will be a mask of 0s and 1s.

Logistic Regression for Multilabel classification

  • Coefficient = 10
  • L2-regularization technique

Evaluation

Results evaluated using several classification metrics:

Libraries

  • Numpy — a package for scientific computing.
  • Pandas — a library providing high-performance, easy-to-use data structures and data analysis tools for the Python
  • scikit-learn — a tool for data mining and data analysis.
  • NLTK — a platform to work with natural language.

Note: this sample project was originally created by @partoftheorigin