Issue 35: Draft of methods #37

kaitejohnson · 2025-02-12T18:39:01Z

Description

This PR closes #35.

This is a write-up of the methods being implemented to:

estimate a delay distribution
- from a complete reporting matrix
- using imputation of the point nowcast
estimate a point nowcast from an incomplete reporting triangle and delay distribution
estimate observation error in a nowcast from a complete or partially observed reporting triangle.
- iteratively and retrospectively re-estimate delay distribution and re-compute nowcasts, in order to simulate the out-of-sample predictive error that would have been made in the past

Note @seabbs This could benefit from #34 to preview the new article but I am stuck on that 404 error (which I think is from the different website name than what is expected, but not sure since website is working fine in main)

Checklist

My PR is based on a package issue and I have explicitly linked it.
I have included the target issue or issues in the PR title in the for Issue(s) issue-numbers: PR title
I have read the contribution guidelines.
I have tested my changes locally.
I have added or updated unit tests where necessary.
I have updated the documentation if required.
My code follows the established coding standards.
I have added a news item linked to this PR.
I have reviewed CI checks for this PR and addressed them as far as I am able.

codecov · 2025-02-17T11:33:23Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.44%. Comparing base (5d23bd5) to head (1ae52c0).
Report is 37 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #37      +/-   ##
==========================================
+ Coverage   91.48%   99.44%   +7.95%     
==========================================
  Files           5        6       +1     
  Lines          94      180      +86     
==========================================
+ Hits           86      179      +93     
+ Misses          8        1       -7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

seabbs · 2025-02-20T12:00:19Z

method 2: iteratively and retrospectively re-compute nowcasts using the same single point estimate of the delay distribution

Why is this permutation in here? Is it because we plan to support it or otherwise?

seabbs

Partial review more coming shortly

seabbs · 2025-02-20T12:03:11Z

vignettes/model_definition.Rmd

+  %\VignetteEncoding{UTF-8}
+---
+
+# `baselinenowcast` mathematical model


pedantic point but dont really need the package name in the docs of the package?

vignettes/model_definition.Rmd

seabbs · 2025-02-20T12:04:52Z

vignettes/model_definition.Rmd

+
+# `baselinenowcast` mathematical model
+
+The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology.


As we spoke about at the conference it more assumes that the reporting triangle is contiguous doesn't it? So you could make it support daily and weekly etc without changing the method but by just adding in preprocessing to the reporting triangle?

As discussed f2f, will reword this to explain that the method is time-unit agnostic, and so it is flexible enough to handle any combination of reference and reporting date, but will need to be pre-processed such that the units of both are the same...

seabbs · 2025-02-20T12:05:15Z

vignettes/model_definition.Rmd

+
+# `baselinenowcast` mathematical model
+
+The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology.


Suggested change

The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology.

The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)).

seabbs · 2025-02-20T12:06:28Z

vignettes/model_definition.Rmd

+
+The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology.
+
+### Notation


before getting going with it I really like the idea of a high level overview of the general approach in a single paragraph. Perhaps with some refs to similar approaches and pointing out what it is not similar to. Perhaps this needs to go elsewhere though vs here?

I think we could intro this with a schematic using the reporting triangle and highlighting the "blocks" being computed in the matrix? I feel like there might have been a nice slide on this in one of the talks, or can make one myself.

seabbs · 2025-02-20T12:19:42Z

vignettes/model_definition.Rmd

+### Point estimate of the delay distribution
+
+We use the entire reporting triangle to compute an empirical estimate of the delay distribution, $\pi(d)$, or the probability that a case at reference time $t$ appears in the dataset at time $t + d$. We will refer to the realized empirical estimate of the delay distribution from a reporting triangle as $\pi(d)$.
+The delay distribution, $\pi(d)$ can be estimated directly from the completed reporting matrix $X$


I am. a big fan of always describing it in english as well as equations.

seabbs · 2025-02-20T12:21:18Z

vignettes/model_definition.Rmd

+{\pi}_d= \frac{\sum_{t=1}^{t=t^*} X_{t,d}}{\sum_{d=0}^{D} \sum_{t=1}^{t=t^*} X_{t,d}}
+$$
+
+In the special case when the time the estimate is made $t'$, is beyond the data release time $t^{*}$, such that $t' \ge t^* + D$, $\hat{\pi}_d$ can be computed directly by summing over all reference time points $t$ at each delay $d$.


Can you reword I think you just me that t prime is greater than the the release time + the maximum delay as the maths says i.e it is fully reported

seabbs · 2025-02-20T12:22:30Z

vignettes/model_definition.Rmd

+$$
+
+In the special case when the time the estimate is made $t'$, is beyond the data release time $t^{*}$, such that $t' \ge t^* + D$, $\hat{\pi}_d$ can be computed directly by summing over all reference time points $t$ at each delay $d$.
+In the case where there are partial observations, in order to properly weight the denominator with the missing delays, we have to first impute the cases $\hat{x}_{t,d}$ for all instances where $t+d > t^*$. This amounts to computing the point nowcast from the partial reporting triangle.


In my head this justification should be in the next section as that is what it is about?

seabbs · 2025-02-20T12:25:38Z

vignettes/model_definition.Rmd

+To do so, we start by defining $\theta_d$, which is the factor by which the cases on delay $d$ compare to the total cases through delay $d-1$, obtained from $N$ preceding rows of the triangle. In practice, $N \ge D$, with any $N > D$ representing the number of completed observations used to inform the estimate
+
+$$
+\hat{\theta}_d(t^*) = \frac{\sum_{i=1}^{N} x_{t^*-i+d, d}}{\sum_{d=1}^{d-1} \sum_{i=1}^{N} x_{t^*-i+d,d}}


Notationally all the indexes here feel a bit comlex for what it is. I dont have any great thoughts on simplification though

Yep this was where I was struggling a bit last week, its way more complex in math notation than in just looking at the code and the blocks of the matrix being summed over...

I was wondering if you maybe redefine the sum?

seabbs · 2025-02-20T12:26:40Z

vignettes/model_definition.Rmd

+\hat{\theta}_d(t^*) = \frac{\sum_{i=1}^{N} x_{t^*-i+d, d}}{\sum_{d=1}^{d-1} \sum_{i=1}^{N} x_{t^*-i+d,d}}
+$$
+
+*Note:* this amounts to taking the sum of the elements in column $d$ up until time $t*-d$ and dividing by the sum over all the elements to the left of column $d$ up until time $t*-d$, referred to as `block_top` and `block_top_left`, respectively, in the code.


Suggested change

*Note:* this amounts to taking the sum of the elements in column $d$ up until time $t*-d$ and dividing by the sum over all the elements to the left of column $d$ up until time $t*-d$, referred to as `block_top` and `block_top_left`, respectively, in the code.

This amounts to taking the sum of the elements in column $d$ up until time $t*-d$ and dividing by the sum over all the elements to the left of column $d$ up until time $t*-d$, referred to as `block_top` and `block_top_left`, respectively, in the code.

Then we can just copy and paste into the paper

I think this is a nice explanation but perhaps there is a slightly more human explanation?

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>

kaitejohnson · 2025-02-20T15:08:35Z

method 2: iteratively and retrospectively re-compute nowcasts using the same single point estimate of the delay distribution

Why is this permutation in here? Is it because we plan to support it or otherwise?

As discussed f2f, I think we want to nix this so I will edit accordingly.

github-actions · 2025-02-20T15:09:50Z

Thank you for your contribution kaitejohnson 🚀! Your website is ready for download 👉 here 👈!
_{(The artifact expires on 2025-02-25T15:12:11Z. You can re-generate it by re-running the workflow here.)}

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>

seabbs · 2025-02-21T10:23:05Z

vignettes/model_definition.Rmd

+
+We add a small number to the mean to avoid an ill-defined negative binomial. We note that to perform all these computations, data snapshots from at least $N +M$ past observations, or rows of the reporting triangle, are needed. This estimate of the uncertainty accounts for the empirical uncertainty in the point estimate of the delay distribution over time.
+
+#### Uncertainty estimate via computing retrospective nowcasts from a single delay distribution, $\pi(d)$


Discussed face to face we are dropping this

seabbs · 2025-02-21T10:23:52Z

vignettes/model_definition.Rmd

+
+#### Uncertainty estimate via iteratively re-estimating the delay distribution and computing retrospective nowcasts
+
+The first method uses the retrospective incomplete reporting triangle to re-estimate a delay distribution using the $N$ preceding rows of the reporting triangle before $s^*$, and using it to recompute a retrospective nowcast, for $M$ realizations of the retrospective reporting triangle (so $M$ different $s^*$ values).


as discussed face to face we are going to generalise this so it can use any set of historic or retrospective point nowcasts and then provide wrapper tools for more general cases

Kaitlyn Johnson and others added 8 commits February 12, 2025 19:34

first draft markdown of methods

ebb190d

WIP

66c5e18

add first draft of methods

cad1b83

fix formatting

a94fe47

attempt to fix formatting

363c6ac

add link to paper

36982aa

change to rmd

5a64793

attempt to add mdel definition and rendered getting started to site

1ae52c0

wording changes

00faeff

kaitejohnson marked this pull request as ready for review February 17, 2025 11:55

kaitejohnson requested a review from seabbs February 17, 2025 11:56

seabbs requested changes Feb 20, 2025

View reviewed changes

Update vignettes/model_definition.Rmd

561848e

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>

Update vignettes/model_definition.Rmd

d9f7a43

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>

seabbs reviewed Feb 21, 2025

View reviewed changes

kaitejohnson marked this pull request as draft February 21, 2025 11:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 35: Draft of methods #37

Issue 35: Draft of methods #37

kaitejohnson commented Feb 12, 2025 •

edited

Loading

codecov bot commented Feb 17, 2025

seabbs commented Feb 20, 2025

seabbs left a comment

seabbs Feb 20, 2025

seabbs Feb 20, 2025

kaitejohnson Feb 20, 2025

seabbs Feb 20, 2025

seabbs Feb 20, 2025

kaitejohnson Feb 20, 2025

seabbs Feb 20, 2025

seabbs Feb 20, 2025

seabbs Feb 20, 2025

seabbs Feb 20, 2025

kaitejohnson Feb 20, 2025

seabbs Feb 20, 2025

seabbs Feb 20, 2025

seabbs Feb 20, 2025

kaitejohnson commented Feb 20, 2025

github-actions bot commented Feb 20, 2025 •

edited

Loading

seabbs Feb 21, 2025

seabbs Feb 21, 2025


		# `baselinenowcast` mathematical model

		The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology.


		The following describes the estimate of the delay distribution, the generation of the point nowcast, and the estimate of the observation error, for a partially observed or complete reporting triangle. The method assumes that the units of the delays, $d$\$ and the units of reference time $t$ are the same, e.g. weekly data and weekly releases or daily data with daily releases. This method is based on the method described by ([Wolffram et al. 2023](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011394)) developed by the Karlsruhe Institute of Technology.

		### Notation

	Note: this amounts to taking the sum of the elements in column $d$ up until time $t-d$ and dividing by the sum over all the elements to the left of column $d$ up until time $t-d$, referred to as `block_top` and `block_top_left`, respectively, in the code.
	This amounts to taking the sum of the elements in column $d$ up until time $t-d$ and dividing by the sum over all the elements to the left of column $d$ up until time $t-d$, referred to as `block_top` and `block_top_left`, respectively, in the code.


		We add a small number to the mean to avoid an ill-defined negative binomial. We note that to perform all these computations, data snapshots from at least $N +M$ past observations, or rows of the reporting triangle, are needed. This estimate of the uncertainty accounts for the empirical uncertainty in the point estimate of the delay distribution over time.

		#### Uncertainty estimate via computing retrospective nowcasts from a single delay distribution, $\pi(d)$


		#### Uncertainty estimate via iteratively re-estimating the delay distribution and computing retrospective nowcasts

		The first method uses the retrospective incomplete reporting triangle to re-estimate a delay distribution using the $N$ preceding rows of the reporting triangle before $s^$, and using it to recompute a retrospective nowcast, for $M$ realizations of the retrospective reporting triangle (so $M$ different $s^$ values).

Issue 35: Draft of methods #37

Are you sure you want to change the base?

Issue 35: Draft of methods #37

Conversation

kaitejohnson commented Feb 12, 2025 • edited Loading

Description

Checklist

codecov bot commented Feb 17, 2025

Codecov Report

seabbs commented Feb 20, 2025

seabbs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaitejohnson commented Feb 20, 2025

github-actions bot commented Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaitejohnson commented Feb 12, 2025 •

edited

Loading

github-actions bot commented Feb 20, 2025 •

edited

Loading