This repository contains scripts for developing, training and evaluating various machine learning models using python frameworks such as PySpark Mlib (the Machine Learning Framework provided in PySpark), Scikit-Learn, XGBoost and Neural Networks.
The model development process is tracked using MLflow, allowing for transformer and estimator parameters to be tracked, logged and eventually registered throughout the model development process.
Multiclass Obesity:
- Description: Predicts the obesity level of patients based on their eating/lifestyle habits and physical condition.
- Includes: Data preprocessing, model selection,hyperparameter tuning, model evaluation metrics (accuracy, precision, recall, F1-score).
Adult Income:
- Description: Classifies the earnings of individuals into two classes - above $50k or below $50k.
- Includes: Data preprocessing, model selection,hyperparameter tuning, model evaluation metrics (accuracy, precision, recall, F1-score).