-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 35: Draft of methods #37
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #37 +/- ##
==========================================
+ Coverage 91.48% 99.44% +7.95%
==========================================
Files 5 6 +1
Lines 94 180 +86
==========================================
+ Hits 86 179 +93
+ Misses 8 1 -7 ☔ View full report in Codecov by Sentry. |
Why is this permutation in here? Is it because we plan to support it or otherwise? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review more coming shortly
%\VignetteEncoding{UTF-8} | ||
--- | ||
|
||
# `baselinenowcast` mathematical model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pedantic point but dont really need the package name in the docs of the package?
vignettes/model_definition.Rmd
Outdated
|
||
# `baselinenowcast` mathematical model | ||
|
||
The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we spoke about at the conference it more assumes that the reporting triangle is contiguous doesn't it? So you could make it support daily and weekly etc without changing the method but by just adding in preprocessing to the reporting triangle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed f2f, will reword this to explain that the method is time-unit agnostic, and so it is flexible enough to handle any combination of reference and reporting date, but will need to be pre-processed such that the units of both are the same...
vignettes/model_definition.Rmd
Outdated
|
||
# `baselinenowcast` mathematical model | ||
|
||
The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology. | |
The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)). |
|
||
The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology. | ||
|
||
### Notation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before getting going with it I really like the idea of a high level overview of the general approach in a single paragraph. Perhaps with some refs to similar approaches and pointing out what it is not similar to. Perhaps this needs to go elsewhere though vs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could intro this with a schematic using the reporting triangle and highlighting the "blocks" being computed in the matrix? I feel like there might have been a nice slide on this in one of the talks, or can make one myself.
### Point estimate of the delay distribution | ||
|
||
We use the entire reporting triangle to compute an empirical estimate of the delay distribution, $\pi(d)$, or the probability that a case at reference time $t$ appears in the dataset at time $t + d$. We will refer to the realized empirical estimate of the delay distribution from a reporting triangle as $\pi(d)$. | ||
The delay distribution, $\pi(d)$ can be estimated directly from the completed reporting matrix $X$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am. a big fan of always describing it in english as well as equations.
{\pi}_d= \frac{\sum_{t=1}^{t=t^*} X_{t,d}}{\sum_{d=0}^{D} \sum_{t=1}^{t=t^*} X_{t,d}} | ||
$$ | ||
|
||
In the special case when the time the estimate is made $t'$, is beyond the data release time $t^{*}$, such that $t' \ge t^* + D$, $\hat{\pi}_d$ can be computed directly by summing over all reference time points $t$ at each delay $d$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reword I think you just me that t prime is greater than the the release time + the maximum delay as the maths says i.e it is fully reported
$$ | ||
|
||
In the special case when the time the estimate is made $t'$, is beyond the data release time $t^{*}$, such that $t' \ge t^* + D$, $\hat{\pi}_d$ can be computed directly by summing over all reference time points $t$ at each delay $d$. | ||
In the case where there are partial observations, in order to properly weight the denominator with the missing delays, we have to first impute the cases $\hat{x}_{t,d}$ for all instances where $t+d > t^*$. This amounts to computing the point nowcast from the partial reporting triangle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my head this justification should be in the next section as that is what it is about?
To do so, we start by defining $\theta_d$, which is the factor by which the cases on delay $d$ compare to the total cases through delay $d-1$, obtained from $N$ preceding rows of the triangle. In practice, $N \ge D$, with any $N > D$ representing the number of completed observations used to inform the estimate | ||
|
||
$$ | ||
\hat{\theta}_d(t^*) = \frac{\sum_{i=1}^{N} x_{t^*-i+d, d}}{\sum_{d=1}^{d-1} \sum_{i=1}^{N} x_{t^*-i+d,d}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notationally all the indexes here feel a bit comlex for what it is. I dont have any great thoughts on simplification though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this was where I was struggling a bit last week, its way more complex in math notation than in just looking at the code and the blocks of the matrix being summed over...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if you maybe redefine the sum?
\hat{\theta}_d(t^*) = \frac{\sum_{i=1}^{N} x_{t^*-i+d, d}}{\sum_{d=1}^{d-1} \sum_{i=1}^{N} x_{t^*-i+d,d}} | ||
$$ | ||
|
||
*Note:* this amounts to taking the sum of the elements in column $d$ up until time $t*-d$ and dividing by the sum over all the elements to the left of column $d$ up until time $t*-d$, referred to as `block_top` and `block_top_left`, respectively, in the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Note:* this amounts to taking the sum of the elements in column $d$ up until time $t*-d$ and dividing by the sum over all the elements to the left of column $d$ up until time $t*-d$, referred to as `block_top` and `block_top_left`, respectively, in the code. | |
This amounts to taking the sum of the elements in column $d$ up until time $t*-d$ and dividing by the sum over all the elements to the left of column $d$ up until time $t*-d$, referred to as `block_top` and `block_top_left`, respectively, in the code. |
Then we can just copy and paste into the paper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a nice explanation but perhaps there is a slightly more human explanation?
Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
As discussed f2f, I think we want to nix this so I will edit accordingly. |
Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
|
||
We add a small number to the mean to avoid an ill-defined negative binomial. We note that to perform all these computations, data snapshots from at least $N +M$ past observations, or rows of the reporting triangle, are needed. This estimate of the uncertainty accounts for the empirical uncertainty in the point estimate of the delay distribution over time. | ||
|
||
#### Uncertainty estimate via computing retrospective nowcasts from a single delay distribution, $\pi(d)$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed face to face we are dropping this
|
||
#### Uncertainty estimate via iteratively re-estimating the delay distribution and computing retrospective nowcasts | ||
|
||
The first method uses the retrospective incomplete reporting triangle to re-estimate a delay distribution using the $N$ preceding rows of the reporting triangle before $s^*$, and using it to recompute a retrospective nowcast, for $M$ realizations of the retrospective reporting triangle (so $M$ different $s^*$ values). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed face to face we are going to generalise this so it can use any set of historic or retrospective point nowcasts and then provide wrapper tools for more general cases
Description
This PR closes #35.
This is a write-up of the methods being implemented to:
Note @seabbs This could benefit from #34 to preview the new article but I am stuck on that 404 error (which I think is from the different website name than what is expected, but not sure since website is working fine in main)
Checklist