Customer Segmentation with KMeans Clustering

Overview

This project demonstrates a hands-on approach to customer segmentation using KMeans Clustering in Python. The goal is to provide actionable insights into customer behavior for an online retail business. By leveraging the RFM (Recency, Frequency, Monetary) model and clustering techniques, we classify customers into meaningful groups to aid business decision-making.

Features

Data Exploration: Handling missing data, outliers, and inspecting key features like Invoice Numbers, Stock Codes, and Customer IDs.
Data Cleaning: Filtering out invalid or irrelevant data points to ensure accurate analysis.
Feature Engineering: Calculating RFM features from transactional data.
Clustering Analysis: Applying KMeans clustering and evaluating results using the Elbow Method and Silhouette Scores.
Visualization: Insights through 3D plots, violin plots, and cluster labeling.

Project Highlights

Dataset: Online Retail II Dataset (2009-2011 UK-based retail transactions).
Tools Used:
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- OpenPyXL

Workflow

Exploratory Data Analysis (EDA):
- Understanding data structure (columns like Invoice Number, Stock Code, Quantity, Unit Price, Customer ID, etc.).
- Dealing with missing values and inconsistencies (e.g., negative prices or quantities).
Data Cleaning:
- Removing null Customer IDs and irrelevant invoice types (e.g., cancellations or accounting entries).
- Addressing outliers in Quantity, Unit Price, and Stock Codes.
Feature Engineering:
- Creating RFM metrics:
  - Recency: Days since the last purchase.
  - Frequency: Number of purchases.
  - Monetary: Total spending.
- Scaling data to standardize features for clustering.
Clustering Analysis:
- Using the Elbow Method to determine the optimal number of clusters.
- Validating cluster quality with Silhouette Scores.
- Assigning meaningful labels to clusters (e.g., Retain, Reward, Nurture, Re-engage).
Visualization:
- 3D scatter plots to visualize clusters.
- Violin plots for distribution analysis of RFM metrics.

Results

Sample Outputs:

3D Cluster Visualization:
Violin Plots of RFM Metrics:
Cluster Distribution with Average Feature Values:

Getting Started

Prerequisites

Python 3.7+
Required Libraries:
```
pip install -r requirements.txt
```

Dataset

Download the dataset from the UCI Machine Learning Repository and place it in the data/ folder.

Running the Project

Clone this repository:

git clone https://github.com/yourusername/kmeans-clustering.git
cd kmeans-clustering

Install dependencies:
```
pip install -r requirements.txt
```

Open the Jupyter notebook:

jupyter notebook online_retail_data_clustering.ipynb

Run all cells to see the analysis.

Project Structure

|-- data/
|   |-- OnlineRetail_2009-2010.xlsx
|-- notebooks/
|   |-- online_retail_data_clustering.ipynb
|-- requirements.txt
|-- README.md

Challenges

Handling large datasets with over 500,000 records.
Dealing with data inconsistencies, such as cancellations and invalid values.
Optimizing cluster selection to balance granularity and interpretability.

Future Work

Extend analysis to include geolocation data (e.g., customer countries).
Automate cluster labeling using advanced NLP techniques.
Deploy clustering results to a dashboard for real-time business insights.

Acknowledgments

Inspired by Trent's tutorial on clustering techniques.
Dataset from UCI Machine Learning Repository.

License

This project is licensed under the MIT License.

For any questions or suggestions, feel free to contact me via GitHub or open an issue!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
Link for Data .txt		Link for Data .txt
Online_Retail_Data_Clustaring.ipynb		Online_Retail_Data_Clustaring.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation with KMeans Clustering

Overview

Features

Project Highlights

Workflow

Results

Sample Outputs:

Getting Started

Prerequisites

Dataset

Running the Project

Project Structure

Challenges

Future Work

Acknowledgments

License

About

Releases

Packages

Languages

License

ParthDS02/Customer-Segmentation-with-KMeans-Clustering

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation with KMeans Clustering

Overview

Features

Project Highlights

Workflow

Results

Sample Outputs:

Getting Started

Prerequisites

Dataset

Running the Project

Project Structure

Challenges

Future Work

Acknowledgments

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages