1103 restructure

cambiotraining · Mar 11, 2024 · 26498bf · 26498bf
1 parent 7f253e4
commit 26498bf
Show file tree

Hide file tree

Showing 33 changed files with 281 additions and 58 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/_freeze/materials/changes/execute-results/html.json b/_freeze/materials/changes/execute-results/html.json
@@ -1,8 +1,8 @@
 {
-  "hash": "55b3cbdab74b1e54e7316d6c67a3f80b",
+  "hash": "dc8679ba2b03bdacb220cbe395048e20",
   "result": {
     "engine": "knitr",
-    "markdown": "---\ntitle: \"Looking for changes\"\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n::: {.callout-tip}\n## Learning outcomes\n\n- Be able to visualise changes in data\n\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n### Libraries\n### Functions\n\n:::\n:::\n\n## Purpose and aim\n\nIn this section we're going to look at dealing with data that changes. This can be when you've got a data for different points in time, or perhaps some response to different concentrations.\n\nHere the change over time or in concentration may show some interesting properties.\n\n## Loading data\n\nWe'll be using a new data set for this section - it contains similar information as the `gapminder` data set we've used so far, but it has data for different years. There is data from 1960 to 2010.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 <- read_csv(\"data/gapminder1960to2010_socioeconomic.csv\")\n```\n:::\n\n\n:::\n\n## Changes over time\n\nLet's say we're interested in life expectancy. We now have data on this variable for 50 different years, so it'd be nice to see how life expectancy changed over time.\n\nThere are 193 countries in this data set, so it's probably not a good idea to plot them all at once...\n\nLet's focus close to home and see how life expectancy changed in the United Kingdom. To do this, we first filter out all of the data of the United Kingdom, and then plot it.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 %>% \n  filter(country == \"United Kingdom\") %>% \n  ggplot(aes(x = year,\n             y = life_expectancy,\n             group = country)) +\n  geom_line()\n```\n\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n\n:::\n\nWe can see that life expectancy has increased markedly over the last 50 years. Notice that the y-axis is in a range of around 70 - 85! If we'd change that so that the y-axis started at zero, then our plot would look rather different.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can set the y-axis range or limits with `ylim()`, specifying the first and last value that we want in the plot:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 %>% \n  filter(country == \"United Kingdom\") %>% \n  ggplot(aes(x = year,\n             y = life_expectancy,\n             group = country)) +\n  geom_line() +\n  ylim(0, 90)\n```\n\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n:::\n\nThese two plots show the same data, but the clarity of the message is rather different.\n\nThese plots of course show only data for one country, so it doesn't give us much context. How impressive is the increase in life expectancy in the United Kingdom, compared to other countries? We know that, for example, the United States and China have had a lot of economic growth in the past 50 year, so let's compare the United Kingdom with them.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe adjust the filter that we used earlier, to include the United States and China. We also colour the data by country, so that we can distinguish the three countries.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 %>% \n  filter(country %in% c(\"China\", \"United Kingdom\", \"United States\")) %>% \n  ggplot(aes(x = year,\n             y = life_expectancy,\n             colour = country,\n             group = country)) +\n  geom_line()\n```\n\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-6-1.png){width=672}\n:::\n:::\n\n\n::: {.callout-note collapse=\"true\"}\n## Note on `%in%` syntax\n\nWe use `%in%` when we want to compare against a collection of values. Let's look at a very simple data set called `colours`, which contains 5 different colour values:\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ncolours\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 1\n  value \n  <chr> \n1 green \n2 yellow\n3 yellow\n4 red   \n5 purple\n```\n\n\n:::\n:::\n\n\nIf we wanted to filter out the yellow and purple values, we could do that like this:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfilter(colours, value %in% c(\"yellow\", \"purple\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 3 × 1\n  value \n  <chr> \n1 yellow\n2 yellow\n3 purple\n```\n\n\n:::\n:::\n\n\nWhat happens is that R goes through each item after `%in%` and checks if it can find it in the `value` column. So in this case it first checks `yellow`, followed by `purple`.\n\n:::\n\n:::\n\nFrom this plot we can see that the United Kingdom and United States show very similar increases in life expectancy, roughly increasing by 10 years.\n\nHowever, plotting this together with China's life expectancy, it shows that China has seen a much larger increase over the past 50 years, since its life expectancy was only just above 30 year in 1960!\n\n### Exercises\n\n::: {.callout-note icon=false}\n## Home country progress\n\n**Level:** {{< fa solid star >}} {{< fa regular star >}} {{< fa regular star >}}\n\nPlot the life expectancy for your home country against Poland, Chile and Mexico. How does the life expectancy in your home country compare to these countries?\n\n::: {.callout-tip collapse=\"true\"}\n## Answer\n\nFor example:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 %>% \n  filter(country %in% c(\"Netherlands\", \"Poland\", \"Chile\", \"Mexico\")) %>% \n  ggplot(aes(x = year,\n             y = life_expectancy,\n             colour = country,\n             group = country)) +\n  geom_line()\n```\n\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-10-1.png){width=672}\n:::\n:::\n\n\n:::\n:::\n:::\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- Visualising changes over time is a powerful tool to detect trends\n- Decisions on axis limits can dramatically change the message\n:::\n",
+    "markdown": "---\ntitle: \"Looking for changes\"\n---\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n::: {.callout-tip}\n## Learning outcomes\n\n- Be able to visualise changes in data\n\n:::\n\n## Libraries and functions\n\n::: {.callout-note collapse=\"true\"}\n## Click to expand\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n### Libraries\n### Functions\n\n:::\n:::\n\n## Purpose and aim\n\nIn this section we're going to look at dealing with data that changes. These can be changes over time or, for example, changes across treatments / regions / concentrations etc.\n\n## Loading data\n\nWe'll be using a new data set for this section - it contains similar information as the `gapminder` data set we've used so far, but it has data for different years. There is data from 1960 to 2010.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 <- read_csv(\"data/gapminder1960to2010_socioeconomic.csv\")\n```\n:::\n\n\n:::\n\n## Changes over time\n\nLet's say we're interested in life expectancy. We now have data on this variable for 50 different years, so it'd be nice to see how life expectancy changed over time.\n\nThere are 193 countries in this data set, so it's probably not a good idea to plot them all at once...\n\nLet's focus close to home and see how life expectancy changed in the United Kingdom over these years.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nTo do this, we first filter out all of the data of the United Kingdom, and then plot it.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 %>% \n  filter(country == \"United Kingdom\") %>% \n  ggplot(aes(x = year,\n             y = life_expectancy,\n             group = country)) +\n  geom_line()\n```\n\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n\n:::\n\nWe can see that life expectancy has increased markedly over the last 50 years. Notice that the y-axis is in a range of around 70 - 85! If we'd change that so that the y-axis started at zero, then our plot would look rather different.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe can set the y-axis range or limits with `ylim()`, specifying the first and last value that we want in the plot:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 %>% \n  filter(country == \"United Kingdom\") %>% \n  ggplot(aes(x = year,\n             y = life_expectancy,\n             group = country)) +\n  geom_line() +\n  ylim(0, 90)\n```\n\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-5-1.png){width=672}\n:::\n:::\n\n\n:::\n\nThese two plots show the same data, but the clarity of the message is rather different.\n\n:::{.callout-important}\n## Scale matters\n\nHow you scale and define your axes matters, as you might have derived from the plots above. Have a look at the graphs below, which are based on exactly the same data:\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-7-1.png){width=672}\n:::\n:::\n\n\nLet's assume that they were published in the campaign prospectus of the Republican and Democratic parties. Which one do you think ended up where?\n:::\n\nThese plots of course show only data for one country, so it doesn't give us much context. How impressive is the increase in life expectancy in the United Kingdom, compared to other countries? We know that, for example, the United States and China have had a lot of economic growth in the past 50 year, so let's compare the United Kingdom with them.\n\n::: {.panel-tabset group=\"language\"}\n## R\n\nWe adjust the filter that we used earlier, to include the United States and China. We also colour the data by country, so that we can distinguish the three countries.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 %>% \n  filter(country %in% c(\"China\", \"United Kingdom\", \"United States\")) %>% \n  ggplot(aes(x = year,\n             y = life_expectancy,\n             colour = country,\n             group = country)) +\n  geom_line()\n```\n\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-8-1.png){width=672}\n:::\n:::\n\n\n::: {.callout-note collapse=\"true\"}\n## Note on `%in%` syntax\n\nWe use `%in%` when we want to compare against a collection of values. Let's look at a very simple data set called `colours`, which contains 5 different colour values:\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\ncolours\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 1\n  value \n  <chr> \n1 green \n2 yellow\n3 yellow\n4 red   \n5 purple\n```\n\n\n:::\n:::\n\n\nIf we wanted to filter out the yellow and purple values, we could do that like this:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfilter(colours, value %in% c(\"yellow\", \"purple\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 3 × 1\n  value \n  <chr> \n1 yellow\n2 yellow\n3 purple\n```\n\n\n:::\n:::\n\n\nWhat happens is that R goes through each item after `%in%` and checks if it can find it in the `value` column. So in this case it first checks `yellow`, followed by `purple`.\n\n:::\n\n:::\n\nFrom this plot we can see that the United Kingdom and United States show very similar increases in life expectancy, roughly increasing by 10 years.\n\nHowever, plotting this together with China's life expectancy, it shows that China has seen a much larger increase over the past 50 years, since its life expectancy was only just above 30 year in 1960!\n\n### Exercises\n\n::: {.callout-note icon=false}\n## Home country progress\n\n**Level:** {{< fa solid star >}} {{< fa regular star >}} {{< fa regular star >}}\n\nPlot the life expectancy for your home country against Poland, Chile and Mexico. How does the life expectancy in your home country compare to these countries?\n\n::: {.callout-tip collapse=\"true\"}\n## Answer\n\nFor example:\n\n::: {.panel-tabset group=\"language\"}\n## R\n\n\n::: {.cell}\n\n```{.r .cell-code}\ngapminder1960_2010 %>% \n  filter(country %in% c(\"Netherlands\", \"Poland\", \"Chile\", \"Mexico\")) %>% \n  ggplot(aes(x = year,\n             y = life_expectancy,\n             colour = country,\n             group = country)) +\n  geom_line()\n```\n\n::: {.cell-output-display}\n![](changes_files/figure-html/unnamed-chunk-12-1.png){width=672}\n:::\n:::\n\n\n:::\n:::\n:::\n\n## Summary\n\n::: {.callout-tip}\n#### Key points\n\n- Visualising changes over time is a powerful tool to detect trends\n- Decisions on axis limits can dramatically change the message\n:::\n",
     "supporting": [
       "changes_files"
     ],

diff --git a/_freeze/materials/changes/figure-html/unnamed-chunk-12-1.png b/_freeze/materials/changes/figure-html/unnamed-chunk-12-1.png
diff --git a/_freeze/materials/changes/figure-html/unnamed-chunk-7-1.png b/_freeze/materials/changes/figure-html/unnamed-chunk-7-1.png
diff --git a/_freeze/materials/changes/figure-html/unnamed-chunk-8-1.png b/_freeze/materials/changes/figure-html/unnamed-chunk-8-1.png