Churn Prediction Model

Build a binary classification model to predict whether a customer will churn (i.e., discontinue their subscription) or not based on various features in the dataset. You are given a dataset containing customer information and whether they churned or not in the past. Your task is to use this dataset to train a machine learning model and evaluate its performance on a test set.

Getting Started

Installation

Clone the repository and install the required packages using pip:

git clone https://github.com/mohammadrezashariat/Churn-Prediction
cd Churn-Prediction
pip install -r requirements.txt

Usage

To run the prediction model, execute the following command in the project root directory:

python main.py

This will preprocess the data, train the model, and generate predictions for the test set. The model accuracy will be printed to the console.

Dataset

You can use the Telco Customer Churn dataset, which is available on Kaggle at https://www.kaggle.com/blastchar/telco-customer-churn.

Model

The prediction model is a logistic regression classifier, trained using scikit-learn.

Data Preprocessing

The dataset was preprocessed using the following steps:

Remove duplicate rows

There were no duplicate rows in the dataset, so no rows were removed.

Remove unnecessary columns

The "customerID" column was removed as it does not contribute to predicting customer churn.

Handle missing values

There were missing values in 'TotalCharges' the dataset, so fill with mean of 'TotalCharges'.

Convert categorical variables to numerical

Categorical variables such as :

['gender', 'InternetService', 'PaymentMethod', 'Partner', 'Dependents', 'PhoneService', 'PaperlessBilling', 'MultipleLines', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling']

were converted to numerical using one-hot encoding.

Scale numerical variables

The numerical variables "MonthlyCharges", and "TotalCharges" were scaled using np.log1p.

Feature Selection/Engineering

The features used for training the model were selected based on their relevance in predicting customer churn. The selected features were

["gender", "SeniorCitizen", "Partner", "Dependents", "tenure", "PhoneService", "MultipleLines", "InternetService", "OnlineSecurity", "OnlineBackup", "DeviceProtection", "TechSupport", "StreamingTV", "StreamingMovies", "Contract", "MonthlyCharges",
"TotalCharges"]

No additional feature engineering was performed.

Choice of Algorithm

'Logistic regression' was chosen as the algorithm for this problem because it is a simple yet effective algorithm for binary classification problems such as predicting customer churn.

Evaluation Metrics

The model was evaluated using the following metrics:

Accuracy: The proportion of correct predictions made by the model.

ROC Curve: A plot of the true positive rate (TPR) against the false positive rate (FPR) for different classification thresholds.

Confusion Matrix: A matrix showing the number of true positives, true negatives, false positives, and false negatives predicted by the model.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
model		model
utils		utils
.gitignore		.gitignore
Config.py		Config.py
LICENSE		LICENSE
README.md		README.md
WA_Fn-UseC_-Telco-Customer-Churn.csv		WA_Fn-UseC_-Telco-Customer-Churn.csv
logistic_regression_model.sav		logistic_regression_model.sav
main.py		main.py
requirements.txt		requirements.txt
test_.py		test_.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Churn Prediction Model

Getting Started

Installation

Usage

Dataset

Model

Data Preprocessing

Remove duplicate rows

Remove unnecessary columns

Handle missing values

Convert categorical variables to numerical

Scale numerical variables

Feature Selection/Engineering

Choice of Algorithm

Evaluation Metrics

model's performance

About

Releases

Packages

Languages

License

Shariat1994/Churn-Prediction

Folders and files

Latest commit

History

Repository files navigation

Churn Prediction Model

Getting Started

Installation

Usage

Dataset

Model

Data Preprocessing

Remove duplicate rows

Remove unnecessary columns

Handle missing values

Convert categorical variables to numerical

Scale numerical variables

Feature Selection/Engineering

Choice of Algorithm

Evaluation Metrics

model's performance

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages