Skip to content

Data Mining projects concerning: (i) the analysis and clustering of energy data, and (ii) the text classification of reviews.

Notifications You must be signed in to change notification settings

nikpapage23/Data-Mining-projects

Repository files navigation

Data-Mining-projects

A university project created for the lesson "Data Mining and Learning Algorithms".

Project 1

We are using datasets corresponding to the electrical energy needs of the State of California and the sources from which they are met for each day of the year from 1/1/2019 through 12/31/2021 at a five-minute time resolution.

  • Part A: Performing graphic analysis of the data and extracting various conclusions.
  • Part B: Performing the DBSCAN clustering algorithm to detect outlier-days in which demand or production didn't have the expected values.

Project 2

Taking the "amazon.csv" dataset and creating a word vector for each review. Then, we're using a RandomForest Classifier model to predict the rating corresponding to each review. Two approaches: (i) multi-label classification with 5 labels (ratings 1-5) and (ii) multi-label classification with 3 labels, describing the positivity of each review (Positive, Neutral and Negative).

The datasets for both projects can be found and downloaded here.

About

Data Mining projects concerning: (i) the analysis and clustering of energy data, and (ii) the text classification of reviews.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published