A university project created for the lesson "Data Mining and Learning Algorithms".
We are using datasets corresponding to the electrical energy needs of the State of California and the sources from which they are met for each day of the year from 1/1/2019 through 12/31/2021 at a five-minute time resolution.
- Part A: Performing graphic analysis of the data and extracting various conclusions.
- Part B: Performing the DBSCAN clustering algorithm to detect outlier-days in which demand or production didn't have the expected values.
Taking the "amazon.csv" dataset and creating a word vector for each review. Then, we're using a RandomForest Classifier model to predict the rating corresponding to each review. Two approaches: (i) multi-label classification with 5 labels (ratings 1-5) and (ii) multi-label classification with 3 labels, describing the positivity of each review (Positive, Neutral and Negative).
The datasets for both projects can be found and downloaded here.