This project aims to analyze a dataset of used cars to understand which factors influence their prices. The goal is to provide actionable insights and recommendations to a used car dealership, helping them understand what consumers value most in a used car.
This analysis uses a dataset containing information on 426,000 used cars. The data has been sampled from a larger dataset to ensure efficient processing. The objective is to identify key attributes that determine car prices and to suggest strategies for pricing and inventory management.
The analysis follows the CRISP-DM (Cross-Industry Standard Process for Data Mining) framework, a widely recognized approach for handling data projects. The following steps were taken:
- Business Understanding: Defining the problem and the objectives from the dealership's perspective.
- Data Understanding: Exploring the dataset to identify key characteristics and potential influencing factors.
- Data Preparation: Cleaning and organizing the data for analysis.
- Modeling: Applying statistical models and machine learning techniques to understand the relationship between car attributes and their prices.
- Evaluation: Assessing the models' performance to ensure reliability and validity.
- Deployment: Formulating recommendations based on the analysis to guide the dealership's decision-making.
The analysis identifies several factors that significantly impact the price of used cars. The findings will help the dealership prioritize these attributes in their pricing strategies and better manage their inventory to meet customer preferences.
- Python
- Pandas
- NumPy
- Scikit-learn
To replicate this analysis:
- Ensure you have Python installed (version 3.6 or higher is recommended).
- Install the necessary libraries (
pandas
,numpy
,scikit-learn
) using pip:pip install pandas numpy scikit-learn
- Open the notebook in a Jupyter environment or any Python IDE and run the cells sequentially. You can view and run the Jupyter notebook for this project directly on GitHub: View the Notebook
- Github does not allow for files greater than 25MB, so the data folder has the vehicle.csv file zipped to reduce file size
Contributions are welcome! Please feel free to submit a pull request or open an issue for any improvements or bug fixes.