Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 57: Polish get started vignette #59

Merged
merged 15 commits into from
May 30, 2024
Merged

Issue 57: Polish get started vignette #59

merged 15 commits into from
May 30, 2024

Conversation

athowes
Copy link
Collaborator

@athowes athowes commented May 24, 2024

Description

This PR closes #57.

  • Move loading packages to where the loading packages is mentioned
  • Alter the "Finally, in Section ??, we demonstrate that the fitted delay distribution accurately recovers the underlying truth." (I merged two sections)
  • Ideally the Table 2.1 caption should be altered to work like figure captions rather than this strange thing. I feel like this is possible but requires trying a few things out. Basically I'd like to be using (ref:...)
  • The section at the end still has "Figure 2.2 shows…" and an empty figure caption that need to be filled out
  • Attend to all comments of @parksw3 in Polish getting started vignette #57 (comment)
    • Define what we mean by primary vs secondary event
    • Add more citations: I don't think we need to do this
    • Explain Figure 1.4 a bit more carefully
    • Figure 2.1 change to density: unsure we should do this

Checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have included the target issue or issues in the PR title in the for Issue(s) issue-numbers: PR title
  • I have read the contribution guidelines.
  • I have tested my changes locally.
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required.
  • My code follows the established coding standards.
  • I have added a news item linked to this PR.
  • I have reviewed CI checks for this PR and addressed them as far as I am able.

@athowes
Copy link
Collaborator Author

athowes commented May 25, 2024

Figure 2.1: Just so I know I'm understanding this properly, do the orange bars represent full data without censoring? I wonder if it makes more sense to plot them with density plots (alongside the true density) since they're continuous data? We could also consider plotting something like the mean (vertical line) to show the bias.

Agree - I think we don't then plot correctly discretised dists which we probably should.

Regarding this point, even if the data can be any value (continuous) I think a histogram can still be preferred to a density as a density relies on some KDE parameters (and so ultimately is a model of the data rather than a "true" representation, while a histogram is more of a "true" representation).

For this reason not clear to me that it's better to move to density. I do perhaps see that you'd want the histogram bins to line up with "daily" if that's what you mean also?

@athowes athowes requested review from parksw3 and seabbs May 25, 2024 15:21
@athowes
Copy link
Collaborator Author

athowes commented May 27, 2024

I'm ready for this to be reviewed / merged.

Let me know if you'd like to push back on the histogram / density plot point.

@athowes
Copy link
Collaborator Author

athowes commented May 28, 2024

I found a bug in my code (I used obs_cens_trunc rather than obs_cens_trunc_samp) and have updated the barplot to be at a daily level (using geom_col):

image

I have not yet updated away from using the "full data" (towards using the "censored data" as suggested by Sam / above). Here the orange bars are showing a different bias as to before in the rendered version.

I could still be convinced (I should read your paper properly) but as an outsider it looks like the differences between the orange bars here and the theoretical density is just the sampling variability rather than some bias.

Anyway, this is mostly academic because I think the plot now does the job it was intended to do (show that the censored, truncated, sampled data will produce something bad if you use it).

@athowes
Copy link
Collaborator Author

athowes commented May 28, 2024

Here's my light investigation into the discussion about histograms being biased. To me, the final one with highest sample size looks pretty good / unbiased.

image

set.seed(123)

plot_hist <- function(n) {
  data.frame(value = rnorm(n, mean = 0, sd = 1)) |>
    ggplot(aes(x = value)) +
    geom_histogram(aes(y = ..density..), bins = 30, color = "black", fill = "grey90") +
    stat_function(fun = dnorm, args = list(mean = 0, sd = 1), color = "firebrick", size = 1) +
    labs(title = paste0("Sample size: ", n), y = "", x = "") +
    theme_minimal()
}

plots <- lapply(10^{0:5}, plot_hist)
patchwork::wrap_plots(plots)

@athowes athowes marked this pull request as ready for review May 28, 2024 14:02
@seabbs
Copy link
Contributor

seabbs commented May 28, 2024

Here's my light investigation into the discussion about histograms being biased. To me, the final one with highest sample size looks pretty good / unbiased.

Make it strictly positive and use bin sizes that are relatively big vs the mean + I suggest having another look at @parksw3 work on censoring.

I am not clear we want to close #57 without either a new issue addressing this plot or updating it to use actually observed data as discussed. If making a new issue can move above investigation to it.

Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving to plotting the retrospective observed data here and opening an issue for exploring how to plot the underlying continuous data in a way that everyone agrees is correct.

@athowes
Copy link
Collaborator Author

athowes commented May 28, 2024

The main reason I don't like using obs_censored is thinking about changing this text:

If we had access to the data obs, then it would be simple to estimate the delay distribution. However, with only access to obs_cens_trunc_samp, the delay distribution we observe is biased (Figure 2.1) and we must use a statistical model.

Let's say I swap to "if we had access to obs_censored, then it would be simple..". In this case why is the epidist package designed to adjust for censoring? This text would suggest that censoring is unimportant as a distortion of the underlying process. Perhaps here truncation is the bigger effect, but I'd prefer not to suggest censoring is irrelevant. For a sufficiently large interval (let's say we censor the observations here into weeks) then censoring would be the bigger effect.

@athowes
Copy link
Collaborator Author

athowes commented May 28, 2024

Here's an update using the lognormal rather than normal:

image

set.seed(123)

plot_hist <- function(n) {
  meanlog <- 1.8
  sdlog <- 0.5
  
  data.frame(value = rlnorm(n, meanlog = meanlog, sdlog = sdlog)) |>
    ggplot(aes(x = value)) +
    geom_histogram(aes(y = ..density..), bins = 50, color = "black", fill = "grey90") +
    stat_function(fun = dlnorm, args = list(meanlog = meanlog, sdlog = sdlog), color = "firebrick", size = 1) +
    labs(title = paste0("Sample size: ", n), y = "", x = "") +
    theme_minimal()
}

plots <- lapply(10^{0:5}, plot_hist)
patchwork::wrap_plots(plots)

@athowes
Copy link
Collaborator Author

athowes commented May 28, 2024

Here is that plot with all the proposed possibilities:

image

@athowes
Copy link
Collaborator Author

athowes commented May 28, 2024

I've moved this to using the retrospective data.

seabbs
seabbs previously approved these changes May 29, 2024
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good I think. One question about intent as I am slightly not following and another about package dependencies.

@seabbs seabbs force-pushed the polish-get-started branch from a3b4d53 to 0f5570b Compare May 29, 2024 14:35
@athowes athowes enabled auto-merge May 29, 2024 20:37
Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Copy link
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@athowes athowes added this pull request to the merge queue May 30, 2024
Merged via the queue into main with commit 980991b May 30, 2024
3 of 8 checks passed
@athowes athowes deleted the polish-get-started branch May 30, 2024 11:24
seabbs added a commit that referenced this pull request Jan 10, 2025
* Move package load, and put data.table comment into ^[]

* Remove mention of compare section (merged into model)

* Reduce number of code lines a little

* Use ref for the table as hoped!

* Add primary and secondary sentence

* Add text about Figure 2.2

* Add clarification on Figure 1.4

* Improvements to Figure 2.1

* Using gt and dplyr here

* Update to use the retrospective data

* Improve writing about histograms, and fix colour typo

* Downplay censoring less

* Rewrite ref:obs-est caption

* Add dplyr to Suggests

* Update vignettes/epidist.Rmd

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 84d65de3087d6cf1d7a82f7b58af2a259e013a0e [formerly 02dd4f2ff5270cb5ec11364382499ee6be1a1e8b]
Former-commit-id: e487a3dde30d8c0c28b920687aef5cfbdfd4715c
seabbs added a commit that referenced this pull request Jan 21, 2025
* Move package load, and put data.table comment into ^[]

* Remove mention of compare section (merged into model)

* Reduce number of code lines a little

* Use ref for the table as hoped!

* Add primary and secondary sentence

* Add text about Figure 2.2

* Add clarification on Figure 1.4

* Improvements to Figure 2.1

* Using gt and dplyr here

* Update to use the retrospective data

* Improve writing about histograms, and fix colour typo

* Downplay censoring less

* Rewrite ref:obs-est caption

* Add dplyr to Suggests

* Update vignettes/epidist.Rmd

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 980991b
Former-commit-id: def0c7a7d77cc490c9dacd694702162e5f2e27ce
seabbs added a commit that referenced this pull request Jan 21, 2025
* Move package load, and put data.table comment into ^[]

* Remove mention of compare section (merged into model)

* Reduce number of code lines a little

* Use ref for the table as hoped!

* Add primary and secondary sentence

* Add text about Figure 2.2

* Add clarification on Figure 1.4

* Improvements to Figure 2.1

* Using gt and dplyr here

* Update to use the retrospective data

* Improve writing about histograms, and fix colour typo

* Downplay censoring less

* Rewrite ref:obs-est caption

* Add dplyr to Suggests

* Update vignettes/epidist.Rmd

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 980991b
Former-commit-id: def0c7a7d77cc490c9dacd694702162e5f2e27ce
seabbs added a commit that referenced this pull request Jan 21, 2025
* Move package load, and put data.table comment into ^[]

* Remove mention of compare section (merged into model)

* Reduce number of code lines a little

* Use ref for the table as hoped!

* Add primary and secondary sentence

* Add text about Figure 2.2

* Add clarification on Figure 1.4

* Improvements to Figure 2.1

* Using gt and dplyr here

* Update to use the retrospective data

* Improve writing about histograms, and fix colour typo

* Downplay censoring less

* Rewrite ref:obs-est caption

* Add dplyr to Suggests

* Update vignettes/epidist.Rmd

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 980991b
Former-commit-id: def0c7a7d77cc490c9dacd694702162e5f2e27ce
seabbs added a commit that referenced this pull request Jan 21, 2025
* Move package load, and put data.table comment into ^[]

* Remove mention of compare section (merged into model)

* Reduce number of code lines a little

* Use ref for the table as hoped!

* Add primary and secondary sentence

* Add text about Figure 2.2

* Add clarification on Figure 1.4

* Improvements to Figure 2.1

* Using gt and dplyr here

* Update to use the retrospective data

* Improve writing about histograms, and fix colour typo

* Downplay censoring less

* Rewrite ref:obs-est caption

* Add dplyr to Suggests

* Update vignettes/epidist.Rmd

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 84d65de3087d6cf1d7a82f7b58af2a259e013a0e [formerly 02dd4f2ff5270cb5ec11364382499ee6be1a1e8b]
Former-commit-id: e487a3dde30d8c0c28b920687aef5cfbdfd4715c
seabbs added a commit that referenced this pull request Jan 21, 2025
* Move package load, and put data.table comment into ^[]

* Remove mention of compare section (merged into model)

* Reduce number of code lines a little

* Use ref for the table as hoped!

* Add primary and secondary sentence

* Add text about Figure 2.2

* Add clarification on Figure 1.4

* Improvements to Figure 2.1

* Using gt and dplyr here

* Update to use the retrospective data

* Improve writing about histograms, and fix colour typo

* Downplay censoring less

* Rewrite ref:obs-est caption

* Add dplyr to Suggests

* Update vignettes/epidist.Rmd

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 84d65de3087d6cf1d7a82f7b58af2a259e013a0e [formerly 02dd4f2ff5270cb5ec11364382499ee6be1a1e8b]
Former-commit-id: e487a3dde30d8c0c28b920687aef5cfbdfd4715c
Former-commit-id: d0b3080
seabbs added a commit that referenced this pull request Jan 21, 2025
* Move package load, and put data.table comment into ^[]

* Remove mention of compare section (merged into model)

* Reduce number of code lines a little

* Use ref for the table as hoped!

* Add primary and secondary sentence

* Add text about Figure 2.2

* Add clarification on Figure 1.4

* Improvements to Figure 2.1

* Using gt and dplyr here

* Update to use the retrospective data

* Improve writing about histograms, and fix colour typo

* Downplay censoring less

* Rewrite ref:obs-est caption

* Add dplyr to Suggests

* Update vignettes/epidist.Rmd

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 980991b
Former-commit-id: def0c7a7d77cc490c9dacd694702162e5f2e27ce
Former-commit-id: ada96fedcc73b5f5e78259dbd55b3d16434c604b [formerly 92ea6a2]
Former-commit-id: 116b162375f492fcd69c3cda9a187defee3a4eac
seabbs added a commit that referenced this pull request Jan 21, 2025
* Move package load, and put data.table comment into ^[]

* Remove mention of compare section (merged into model)

* Reduce number of code lines a little

* Use ref for the table as hoped!

* Add primary and secondary sentence

* Add text about Figure 2.2

* Add clarification on Figure 1.4

* Improvements to Figure 2.1

* Using gt and dplyr here

* Update to use the retrospective data

* Improve writing about histograms, and fix colour typo

* Downplay censoring less

* Rewrite ref:obs-est caption

* Add dplyr to Suggests

* Update vignettes/epidist.Rmd

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>

---------

Co-authored-by: Sam Abbott <s.e.abbott12@gmail.com>
Former-commit-id: 980991b
Former-commit-id: def0c7a7d77cc490c9dacd694702162e5f2e27ce
Former-commit-id: ada96fedcc73b5f5e78259dbd55b3d16434c604b [formerly 92ea6a2]
Former-commit-id: 116b162375f492fcd69c3cda9a187defee3a4eac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Polish getting started vignette
2 participants