Evaluation, Reproducibility, Benchmarks Meeting 32

Minutes of Meeting 32

Date: 26th February, 2025

The MICCAI deadline is coming up, so we might have fewer today.

Already have some comments from Olivier
Make more visible the contributions so far
Initiatives could be phrased better -- we don't really work on developing new metrics
Include links/summaries of papers we've published
- Closer to top of page
Carole, Nick, and Annika will sit together and flesh out new wording and send this to the group

Original presentation

CIs are always required by regulatory bodies, and generally good practice for science
Not often used at MICCAI (reference)
There are several methods to use
- Parametric vs nonparametric, etc.
Paper describes the five most common methods in use
Characteristics of a good method
- Coverage (can test via simulation only because you don't know the true value)
First conclusion: There is no parametric distribution that appears to be a good fit
Second conclusion: The mean is not a robust summary statistic
Looking at CIs over the median instead gets interesting. SciPy's default methods perform poorly (BCA bootstrap)
- Should use percentile bootstrap instead
Concludes with a flowchart with recommendations based on presence/absence of outliers and sample size

Feedback

Is the decision about mean vs median specifically about outliers? Or are others like skewness applicable here?
Should maybe defer the guidelines for deciding mean vs median to prior publications
Is MICCAI an appropriate venue for proposing guideines?
- Yes, we think so, but we can make clear that they are provisional

Copyright (c) MONAI Consortium