|
| 1 | +--- |
| 2 | +title: "Grouping counts to gain a deeper understanding of the data" |
| 3 | +start: true |
| 4 | +teaching: 10 |
| 5 | +exercises: 10 |
| 6 | +questions: |
| 7 | +- "What does \"grouping counts\" mean?" |
| 8 | +- "How can I use it to give me a better understanding of the data" |
| 9 | +objectives: |
| 10 | +- "Understand what \"grouping counts\" means" |
| 11 | +- "Learn how to group ALA data and interpret it" |
| 12 | +keypoints: |
| 13 | +- "Grouping data can provide valuable insights into what kind of data is avilable on the ALA" |
| 14 | +- "This grouping can also serve to better filer your queries" |
| 15 | +--- |
| 16 | + |
| 17 | +# Group counts by fields |
| 18 | + |
| 19 | +When looking into data such as species occurrences, there may be angles that are hidden by the raw counts of records in the ALA. For example, we could see in our previous query that the number of records for *Litoria peronii* since 2018 in NSW dropped from 61952 to 27969 when we specified we only want records that were documented by FrogID. But what other data resources are we leaving out, and how many records are they each responsible for? |
| 20 | + |
| 21 | +To do this, we will use the `group_by` option in `atlas_counts()`. Any of the fields specified for `filters` can be used in `group_by`. To group your counts, add `group_by="dataResourceName"` to your query, as well as `expand=False` (the `expand` argument will be explained in detail below): |
| 22 | + |
| 23 | +```python |
| 24 | +galah.atlas_counts( |
| 25 | + taxa="litoria peronii", |
| 26 | + filters=["year>=2018", |
| 27 | + "cl22=New South Wales"], |
| 28 | + group_by="dataResourceName", |
| 29 | + expand=False |
| 30 | +) |
| 31 | +``` |
| 32 | +```output |
| 33 | + dataResourceName count |
| 34 | +0 FrogID 39840 |
| 35 | +1 NSW BioNet Atlas 4882 |
| 36 | +2 iNaturalist Australia 2578 |
| 37 | +3 NatureMapr 249 |
| 38 | +4 Earth Guardians Weekly Feed 151 |
| 39 | +5 ALA species sightings and OzAtlas 16 |
| 40 | +6 Victorian Biodiversity Atlas 10 |
| 41 | +7 FrogWatch SA 6 |
| 42 | +8 Australian Museum provider for OZCAM 4 |
| 43 | +9 BowerBird 3 |
| 44 | +10 Melbourne Water Frog Census 2 |
| 45 | +11 SA Fauna 2 |
| 46 | +``` |
| 47 | + |
| 48 | +We can see that there are 12 data resources that have provided the ALA observations of *Litoria peronii*, and surprisingly, FrogID provides the second most observations! |
| 49 | + |
| 50 | +Now, in the query above, we specified that we want records since 2018. However, we can also see how many records came from each year by adding `year` to the `group_by` arguments. |
| 51 | + |
| 52 | +```python |
| 53 | +galah.atlas_counts( |
| 54 | + taxa="litoria peronii", |
| 55 | + filters=["year>=2018", |
| 56 | + "cl22=New South Wales"], |
| 57 | + group_by=["dataResourceName","year"], |
| 58 | + expand=False |
| 59 | +) |
| 60 | +``` |
| 61 | +```output |
| 62 | + dataResourceName year count |
| 63 | +0 FrogID - 39840 |
| 64 | +1 NSW BioNet Atlas - 4882 |
| 65 | +2 iNaturalist Australia - 2578 |
| 66 | +3 NatureMapr - 249 |
| 67 | +4 Earth Guardians Weekly Feed - 151 |
| 68 | +5 ALA species sightings and OzAtlas - 16 |
| 69 | +6 Victorian Biodiversity Atlas - 10 |
| 70 | +7 FrogWatch SA - 6 |
| 71 | +8 Australian Museum provider for OZCAM - 4 |
| 72 | +9 BowerBird - 3 |
| 73 | +10 Melbourne Water Frog Census - 2 |
| 74 | +11 SA Fauna - 2 |
| 75 | +12 - 2018 5200 |
| 76 | +13 - 2019 5469 |
| 77 | +14 - 2020 13358 |
| 78 | +15 - 2021 14469 |
| 79 | +16 - 2022 7506 |
| 80 | +17 - 2023 817 |
| 81 | +18 - 2024 762 |
| 82 | +19 - 2025 162 |
| 83 | +``` |
| 84 | + |
| 85 | +Now, we not only have the data resources providing observations of *Litoria peronii*, we can also see how many observations there were per year. |
| 86 | + |
| 87 | +But what if you wanted to know, for each year, how many records each data resource provided? |
| 88 | + |
| 89 | +This is where the `expand=True` option comes in. This option will tell `galah-python` that you want to see the number of observations for each dadta resource in each year specified. |
| 90 | + |
| 91 | +#### Note: `expand=True` option is the default, and is only possible when you have more than one option for `group_by`; otherwise, you will get an error. |
| 92 | + |
| 93 | +```python |
| 94 | +galah.atlas_counts( |
| 95 | + taxa="litoria peronii", |
| 96 | + filters=["year>=2018", |
| 97 | + "cl22=New South Wales"], |
| 98 | + group_by=["dataResourceName","year"], |
| 99 | +) |
| 100 | +``` |
| 101 | +```output |
| 102 | + dataResourceName year count |
| 103 | +0 FrogID 2018 4154 |
| 104 | +1 FrogID 2019 4382 |
| 105 | +2 FrogID 2020 12248 |
| 106 | +3 FrogID 2021 12851 |
| 107 | +4 FrogID 2022 6205 |
| 108 | +5 NSW BioNet Atlas 2018 850 |
| 109 | +6 NSW BioNet Atlas 2019 872 |
| 110 | +7 NSW BioNet Atlas 2020 808 |
| 111 | +8 NSW BioNet Atlas 2021 1244 |
| 112 | +9 NSW BioNet Atlas 2022 840 |
| 113 | +10 NSW BioNet Atlas 2023 205 |
| 114 | +11 NSW BioNet Atlas 2024 63 |
| 115 | +12 iNaturalist Australia 2018 108 |
| 116 | +13 iNaturalist Australia 2019 113 |
| 117 | +14 iNaturalist Australia 2020 227 |
| 118 | +15 iNaturalist Australia 2021 321 |
| 119 | +16 iNaturalist Australia 2022 409 |
| 120 | +17 iNaturalist Australia 2023 576 |
| 121 | +18 iNaturalist Australia 2024 665 |
| 122 | +19 iNaturalist Australia 2025 159 |
| 123 | +20 NatureMapr 2018 37 |
| 124 | +21 NatureMapr 2019 48 |
| 125 | +22 NatureMapr 2020 47 |
| 126 | +23 NatureMapr 2021 24 |
| 127 | +24 NatureMapr 2022 27 |
| 128 | +25 NatureMapr 2023 33 |
| 129 | +26 NatureMapr 2024 30 |
| 130 | +27 NatureMapr 2025 3 |
| 131 | +28 Earth Guardians Weekly Feed 2018 30 |
| 132 | +29 Earth Guardians Weekly Feed 2019 43 |
| 133 | +30 Earth Guardians Weekly Feed 2020 24 |
| 134 | +31 Earth Guardians Weekly Feed 2021 27 |
| 135 | +32 Earth Guardians Weekly Feed 2022 22 |
| 136 | +33 Earth Guardians Weekly Feed 2023 1 |
| 137 | +34 Earth Guardians Weekly Feed 2024 4 |
| 138 | +35 ALA species sightings and OzAtlas 2018 7 |
| 139 | +36 ALA species sightings and OzAtlas 2019 5 |
| 140 | +37 ALA species sightings and OzAtlas 2020 1 |
| 141 | +38 ALA species sightings and OzAtlas 2022 3 |
| 142 | +39 Victorian Biodiversity Atlas 2018 5 |
| 143 | +40 Victorian Biodiversity Atlas 2019 5 |
| 144 | +41 FrogWatch SA 2019 1 |
| 145 | +42 FrogWatch SA 2020 3 |
| 146 | +43 FrogWatch SA 2023 2 |
| 147 | +44 Australian Museum provider for OZCAM 2018 4 |
| 148 | +45 BowerBird 2018 3 |
| 149 | +46 Melbourne Water Frog Census 2018 2 |
| 150 | +47 SA Fauna 2021 2 |
| 151 | +``` |
0 commit comments