mckinsey-recommendation-challenge

This is my approach to Mckinsey Recommendation Challlenge hosted by AnalyticsVidhya.com here https://datahack.analyticsvidhya.com/contest/mckinsey-analytics-online-hackathon-recommendation/

Rank: 39 from 2683 registered participant.

Solution in jupyter notebook form.

Problem Statement

Your client is a fast-growing mobile platform, for hosting coding challenges. They have a unique business model, where they crowdsource problems from various creators(authors). These authors create the problem and release it on the client's platform. The users then select the challenges they want to solve. The authors make money based on the level of difficulty of their problems and how many users take up their challenge.

The client, on the other hand makes money when the users can find challenges of their interest and continue to stay on the platform. Till date, the client has relied on its domain expertise, user interface and experience with user behaviour to suggest the problems a user might be interested in. You have now been appointed as the data scientist who needs to come up with the algorithm to keep the users engaged on the platform.

The client has provided you with history of last 10 challenges the user has solved, and you need to predict which might be the next 3 challenges the user might be interested to solve. Apply your data science skills to help the client make a big mark in their user engagements/revenue. DATASET DESCRIPTION

We have three data files:

    train.zip: it contains two files:
        train.csv: It contains the set of 13 challenges that were attempted by the same user in a sequence
        Variable 	            Definition
        user_sequence 	      Unique ID for the sequence
        user_id 	            User ID
        challenge_sequence 	  Challenge sequence number (1-13)
        challenge 	          Challenge ID
         
        challenge_data.csv: Contains attributes related to each challenge
        Variable 	              Definition
        challenge_ID 	          Challenge ID
        programming_language 	  Programming language for the challenge
        challenge_series_ID 	  Series for the given challenge
        total_submissions 	    Total submissions by all users
        publish_date 	          Publishing date for the challenge
        author_ID 	            Author ID
        author_gender 	        Author gender
        author_org_ID 	        Organization ID for author
        category_id 	          Type of challenge


    test.zip: 

    Contains the first 10 challenges solved by a new user set (not in train) in the test set. We need to predict the next 3 sequence of challenges for these users.
    Variable 	            Definition
    user_sequence 	      Unique ID for the sequence
    user_id 	            User ID
    challenge_sequence 	  Challenge sequence number (1-10)
    challenge 	          Challenge ID


    sample_submission.csv: 

    It contains the format for submission. Only submissions in this format are acceptable. This should have the next 3 challenges for each user.
Variable 	          Definition
user_sequence 	    Unique ID for the sequence (See Note)
challenge 	        Challenge ID
     
Note: The format is given by "user_id_challenge_sequence". For example, for user ID 2 you must predict the next 3 challenges with sequence 11, 12 and 13 respectively. The corresponding user_sequence would be given by 2_11, 2_12 & 2_13.
EVALUATION METRIC

The evaluation metric is Mean Average Precision (MAP) at K (K = 3). MAP is a well-known metric used to evaluate ranked retrieval results. E.g. Let’s say for a given user, we recommended 3 challenges and only 1st and 3rd challenges are correct. So, the result would look like — 1, 0, 1
In this case, The precision at 1 will be: 11/1 = 1 The precision at 2 will be: 01/2 The precision at 3 will be: 1*2/3 = 0.67 Average Precision will be: (1 + 0 + 0.67)/3 = 0.556. The formula is:

Mean average precision is the mean of average precision calculated across all users

Where n = number of users

Please find the script for the same at this link.
Public and Private Split:

Public leaderboard is based on 30% of the users, while private leaderboard will be evaluated on remaining 70% of users in the test dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
challenge_data.csv		challenge_data.csv
mckinsey-recommendation-challenge-solution.ipynb		mckinsey-recommendation-challenge-solution.ipynb
sample_submission.csv		sample_submission.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mckinsey-recommendation-challenge

About

Releases

Packages

Languages

ahmad-arib/mckinsey-recommendation-challenge

Folders and files

Latest commit

History

Repository files navigation

mckinsey-recommendation-challenge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages