Skip to content

K-means Model to categorize Mall customers into different clusters based on their spending habits

Notifications You must be signed in to change notification settings

philiptitus/Mall-Customers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mall Customers K-Means Clustering Model

This project implements K-Means clustering on the "Mall Customers" dataset from Kaggle. The goal is to segment customers based on their annual income and spending score.

Project Structure

  • model.ipynb: Jupyter Notebook containing the implementation of the K-Means clustering model.
  • mall.csv: Dataset used for clustering.
  • README.md: Project documentation.
  • requirements.txt: List of dependencies required to run the project.

Dataset

The dataset used in this project is the "Mall Customers" dataset from Kaggle. It contains information about customers, including their annual income and spending score.

Steps

  1. Load the Dataset: Load the dataset using pandas and display the first few rows.
  2. Data Preprocessing: Check for missing values and select relevant features for clustering.
  3. Standardize the Data: Standardize the features to ensure equal contribution to distance calculations.
  4. Implement K-Means Clustering: Initialize and fit the K-Means model, then predict the cluster for each data point.
  5. Visualize the Clusters: Create a scatter plot to visualize the clusters.
  6. Evaluate Clustering: Use the Elbow Method and Silhouette Method to determine the optimal number of clusters.
  7. Save the Model and Clustered Data: Save the K-Means model and the clustered data to files.

Evaluation

The performance of the K-Means clustering model is evaluated using the silhouette score. A higher silhouette score indicates better-defined clusters.

Improving the Model

To improve the performance of the K-Means clustering model, consider the following strategies:

  • Feature scaling and normalization
  • Dimensionality reduction (e.g., PCA)
  • Optimal number of clusters (Elbow Method, Silhouette Method)
  • Initialization (k-means++)
  • Multiple runs (n_init parameter)
  • Alternative clustering algorithms (e.g., GMM, DBSCAN)
  • Incorporate domain knowledge

Installation

To run this project, you need to have Python installed. You can install the required dependencies using the following command:

pip install -r requirements.txt

Usage

Open the model.ipynb file in Jupyter Notebook or JupyterLab to see the implementation and results of the K-Means clustering model.

License

This project is licensed under the MIT License.

Releases

No releases published

Packages

No packages published