Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intuitive explanation for different between true distribution and observed samples #208

Open
adamkucharski opened this issue Feb 21, 2025 · 1 comment

Comments

@adamkucharski
Copy link

Thanks for the nice work on this package. Following discussion on epiparameter, I wondered if there is an intuitive way of communicating why the difference between the true distribution and observed samples looks like it does, e.g. for wider students or field epi audiences. The simulations presented make a clear case that there is a difference, but I've found it can often helpful to have a non-technical conceptual explanation as well (if feasible).

For example, the conversion from prevalence and incidence in inc2prev can be conceptually summarised as scaling the curve by the sum of the probability mass function for test positivity and shifting back in time by the mean duration of positivity (i.e. how much positivity is there and how long does it last).

Based on Equation 3.1, it feel like it could be explained in terms of density and averages. But perhaps the way the integrals come out, it's not possible to simply summarise without either a simulation or full derivation (in which case, this issue is not relevant).

@SamuelBrand1
Copy link
Collaborator

Hi @adamkucharski .

Its not really expressed in build up to Eq 3.1 (yet!) but I think the easiest mathematical story to tell here is that the estimation challenge for copies of some delay distribution $D$ with double censored data can be converted into a single censored data problem but for $\tilde{D} = U + D$.

tl; dr version is you fix a time point (e.g. the beginning of the primary interval) and the overall delay from that point is the combination of the time until primary event $U$ and then the delay $D$ (Eq 3.1 is set up slightly differently and we should unify the explanations at some point).

Some common approaches that approximate the double censor problem as a single censor problem (e.g. using primary interval midpoint) are equivalent to approximating the $U$ random variable as a single value. So this introduces bias into variance estimation and, potentially, mean duration estimation whatever approach you take e.g. Bayes posterior, MLE etc etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants