Improved how anomalous floating-point calculations are handled in the particle filter resamplers. #238

JeffreyCovington · 2025-02-26T22:00:03Z

Fixed the handling of two floating-point issues that can occur in the particle filter during likelihood calculations. The behavior in these cases matches the mathematical formulation of the particle filter. These issues are particularly relevant to multi-node simulations.

Not a number (NaN) floating-point values in the data (sometimes used by ADRIOs to indicate missing or invalid data) are now omitted from the joint likelihood calculation when some, but not all, nodes have missing data. This also prevents NaN log-weight values from being passed directly into the resampler.
Additional fail-safes were added to the ResamplingByNode resampler to catch and properly handle underflow-related issues. This brings the underflow handling of the ResamplingByNode resampler in line with the default resampler.

… filter resamplers. Added additional fail-safes to the `ResamplingByNode` resampler to avoid underflow issues.

JavadocMD · 2025-02-26T23:34:57Z

No complaints with the changes as written. I'm considering the implications for the ADRIO interface. Relying on case counts to be np.nan when missing implies the observational data is typed np.float64, since np.int64 doesn't support nan. This isn't hugely problematic, but feels messy -- it seems tidier for ADRIOs to return the data type that best matches their data -- e.g., integers for integer data, as many of the CDC sources are in truth.

This is why I'm leaning towards ADRIOs returning masked arrays for missing/invalid values -- masks work the same way across all types.

That said, I think we can actually meet in the middle if working with nans makes better sense for the PF computation. This might involve doing some early processing on the observation data input, say circa particle_filter.py#L173:

data = np.ma.asarray(cases, dtype=np.float64).filled(np.nan)

In some quick testing, cases can be a scalar, python list, regular numpy array, or masked numpy array; data comes out as a regular numpy array of floats, possibly containing nans in each case.

What do you think of that approach?

JeffreyCovington · 2025-02-27T00:55:35Z

I consider this PR to be a bug fix since the particle filter should have had this behavior (dealing with nan and multi-node underflow) to begin with given the examples using the PF were aimed at the CDC ADRIOs.

In general, working with np.nan's is not ideal for the reasons you mentioned. I envision that eventually the PF will be refactored to use a masked-array approach (whether using np.ma or something custom) internally and will support both np.int64 and np.float64 data.

Once the ADRIO interface is refactored, your suggested fix will work well until the PF refactor can be implemented.

Omitted nan values from joint likelihood calculations in the particle…

4cd4bc1

… filter resamplers. Added additional fail-safes to the `ResamplingByNode` resampler to avoid underflow issues.

JeffreyCovington requested a review from JavadocMD February 26, 2025 22:00

JeffreyCovington linked an issue Feb 26, 2025 that may be closed by this pull request

Improve how the particle filter handles observations in multi-node simulations #237

Closed

JeffreyCovington merged commit 9b147cc into main Feb 27, 2025
1 check passed

JeffreyCovington deleted the pf-floating-point-errors branch February 27, 2025 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved how anomalous floating-point calculations are handled in the particle filter resamplers. #238

Improved how anomalous floating-point calculations are handled in the particle filter resamplers. #238

JeffreyCovington commented Feb 26, 2025

JavadocMD commented Feb 26, 2025

JeffreyCovington commented Feb 27, 2025

Improved how anomalous floating-point calculations are handled in the particle filter resamplers. #238

Improved how anomalous floating-point calculations are handled in the particle filter resamplers. #238

Conversation

JeffreyCovington commented Feb 26, 2025

JavadocMD commented Feb 26, 2025

JeffreyCovington commented Feb 27, 2025