Members : 이진서, 임상우, 김재욱 (FastCampus Datascience School 14th)
Incorporated in 2011, SOCAR has become the largest car-sharing service provider in South Korea. Operating about 12K vehicles, SOCAR has over 5.8 million accumulated users, 200K of whom use the service each month. While strong demand growth for SOCAR service is expected to continue in the foreseeable future, there are growing concerns with an increasing number of insurance fraud cases. The purpose of this project is to build an optimal machine learning model for fraud detection with our focus on the recall score for model evaluation. The dataset is provided by SOCAR but some information in the attached presentation is hidden as requested by the data provider for security reasons.
- EDA
- Pre-processing
- Modeling
- Validation
- Conclusion
Step 1. Baseline Set (with Raw Data)
Original Models
- Logistic Regression
- SupportVectorMachine
- RandomForestClassifier
Extended Models
- EasyEnsembleClassifier
- BalancedRandomForestClassifier
- RUSBoostClassifier
Step 2. Preprocessing
- One-Hot-Encoding
- Outlier Removal
- Scaling : RobustScaler, MinMaxScaler
- Train/Test Split
Step 3. Hyperparameter Tuning
- Gridsearch (Recall Priority)
Step 4. Imbalanced Data Tuning
- Resampling (Oversampling, Undersampling)
Step 5. Validation
- Easy Ensemble Classifier with StratifiedKFold
Step 6. Conclusion
- Limitation
- Interpretation
- Sweetviz
- Draw.io
- Terms of Socar : https://www.socar.kr/terms
- Hands-On Machine Learning with Scikit-Learn and TensorFlow / Aurélien Géron