Skip to content

katehuangishere/Data-Analytics-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Analytics-Projects

project 1

project 2

  • Our focus should be on cluster 1, comprising individuals with both high Spending Scores and substantial income.
  • Among these customers, 54 percent are women. To appeal to this group, we should devise a marketing campaign centered around the popular items found in this cluster.
  • As for cluster 2, there is a promising chance to conduct a sales event targeting its customers, especially for popular items.

project 3

  • Abstract: Cleaning, Analysis and Visualization from Nexflix Data
  • Link: https://www.kaggle.com/datasets/ariyoomotade/netflix-data-cleaning-analysis-and-visualization?select=netflix1.csv
  • Conclusion
    • The total count of Movies is higher than that of TV Shows.
    • Prior to the year 2020, there was a positive growth in the number of Movie and TV Show releases.
    • Moreover, the year 2017 saw the highest number of Movie releases, with 766 releases, while the year 2020 had the highest number of TV Show releases, with 436 releases.
    • The top five countries with the highest number of releases are United States (57.6%), India (18.8%), United Kingdom (11.3%), Pakistan (7.5%), and Canada (4.8%).
    • The director with the most published works is Rajiv Chilaka, with 20 productions.

project 4

  • Dataset: https://www.kaggle.com/datasets/jayrav13/olympic-track-field-results (Olympic Track & Field Results published on Kaggle)
  • Tasks:
    • Use tableau to create a scatterplot with years on the X axis and performance on the Y axis similar to the ones posted here: https://www.kaggle.com/code/drgilermo/ahead-of-their-time
    • Create a multiple display chart in which years is the X axis and there are multiple displays of the chart in produced in 1 for the events I have selected. Different events have different ranges on the Y axis
      1. Include sports with times in the same range (for example, 100m, 200m, 400m, 800m)
      2. Recode the Y variable in % improvements since the sports is introduced.
    • Dashboards help us explore our data quickly. Create a dashboard in which the user chooses the event and the gender and the visual in 1 appears.
    • Create a connected scatterplot with the results from 2. On the X axis you have years on the Y axis you can have a variable of your choice (seconds, %improvement, or pace). Include in the connected scatterplot only the performance of the gold Metalist or the average of the performance of all medalists for a given year
    • Create the following visual presented in NYT for an event of your choice https://archive.nytimes.com/www.nytimes.com/interactive/2012/08/05/sports/olympics/the-100-meter-dash-one-race-every-medalist-ever.html
    • Use python to find
      • Which sports the performance has increased or decreased the most?
      • In which sports the difference between men and women is the least/most (in percentage terms).
      • In which sports and years the difference between the gold and the silver is the greatest?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published