This project applies K-Means clustering to segment customers based on their Annual Income and Spending Score. The goal is to identify different groups of shoppers, which businesses can use for targeted marketing strategies.
- Source: Kaggle - Customer Segmentation Dataset
- Columns Used:
Annual Income (k$)
: Customer’s yearly income in thousand dollars.Spending Score (1-100)
: Score assigned based on shopping behavior.
- Read the dataset using Pandas.
- Explore basic statistics and check for missing values.
- No missing values were found.
- Selected
Annual Income
andSpending Score
for clustering.
- Used the Elbow Method to determine the best number of clusters.
- Plotted inertia values to identify the optimal
k
.
- Applied K-Means with
k=5
. - Assigned each customer a cluster label.
- Scatter plot to illustrate different customer segments.
- Cluster 1: Low income, low spending (budget-conscious shoppers).
- Cluster 2: High income, high spending (luxury shoppers).
- Cluster 3: Average income, average spending.
- Cluster 4: High income, low spending (careful spenders).
- Cluster 5: Low income, high spending (impulsive shoppers).
Mall_Customers.csv
: Original dataset.customer_segments.csv
: Processed dataset with assigned clusters.customer_segmentation.ipynb
: Jupyter Notebook with all code.README.md
: Project overview.annual_income_distribution.png
,spending_score_distribution.png
,income_vs_spending.png
: Distribution plots.elbow_method.png
,customer_segments.png
: Visualizations of clustering results.
- Try different clustering algorithms like DBSCAN or Hierarchical Clustering.
- Add age and gender to the clustering model for deeper insights.
- Build a dashboard to visualize customer segments interactively.
Author: Andrew Jaya Satyo
LinkedIn: linkedin.com/in/andrew-jaya-satyo-1501992b4
Email: andrewjaya12345@gmail.com