This is an end-to-end project for predicting diamond prices based on various attributes such as carat, cut, color, clarity, depth, table, dimensions (x, y, z), and price. The project utilizes machine learning techniques to build predictive models and provides a web application interface for users to input data and receive predictions.
-
artifacts/: Contains generated files
preprocessor.pkl
raw.csv
test.csv
train.csv
-
dist/: Distribution package
DiamondPricePrediction-0.0.1-py3.8.egg
-
notebooks/: Jupyter notebooks for analysis and model training
- data/: Folder with raw data
EDA.ipynb
: Exploratory Data Analysis notebookModel Training.ipynb
: Model training notebook
-
src/: Source code for the application
-
components/: Component files for data processing and model training
__init__.py
data_ingestion.py
data_transformation.py
model_trainer.py
-
pipelines/: Pipelines for prediction and training
__init__.py
prediction_pipeline.py
training_pipeline.py
-
exception.py
: Custom exceptions for error handling -
logger.py
: Logging utility for tracking events in the application -
utils.py
: Utility functions for various tasks
-
-
templates/: HTML templates for the web app
form.html
: Input form for predictionsindex.html
: Main page of the web app
-
application.py
: Main application script to run the project -
.gitignore
: Files and directories to ignore in Git -
LICENSE
: Licensing information -
README.md
: Documentation file (this file) -
requirements.txt
: Required Python dependencies -
setup.py
: Setup configuration for the package
- Python: The primary programming language used in this project.
- Flask: A lightweight WSGI web application framework for building the web interface.
- Jupyter Notebooks: For exploratory data analysis and model training.
- Machine Learning Libraries: Such as scikit-learn for building predictive models.
The dataset used in this project contains the following attributes:
id
: Unique identifier for each diamond.carat
: Weight of the diamond.cut
: Quality of the cut (e.g., Fair, Good, Very Good, Premium, Ideal).color
: Diamond color, from J (worst) to D (best).clarity
: A measurement of how clear the diamond is.depth
: Total depth percentage, calculated as 2*z/(x+y).table
: Width of the top of the diamond relative to its widest point.x
,y
,z
: Dimensions of the diamond in mm.price
: Price of the diamond in US dollars.
The following models were trained and evaluated:
-
Linear Regression
- RMSE: 1013.90
- MAE: 674.03
- R² Score: 93.69
-
Lasso Regression
- RMSE: 1013.88
- MAE: 675.07
- R² Score: 93.69
-
Ridge Regression
- RMSE: 1013.91
- MAE: 674.06
- R² Score: 93.69
-
ElasticNet
- RMSE: 1533.42
- MAE: 1060.74
- R² Score: 85.56
These metrics indicate that Linear Regression and Lasso Regression performed best among the models tested.
To set up this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/Monish-Nallagondalla/DiamondPricePrediction.git cd DiamondPricePrediction
-
Install the required dependencies:
pip install -r requirements.txt
To run the application, execute the following command:
python application.py
Once running, open your web browser and navigate to http://127.0.0.1:5000 to access the application.
License
This project is licensed under the MIT License - see the LICENSE file for details. Feel free to customize any sections or add additional information as necessary!