Following late edits the night before Y was presenting the Petcare China map to clients, the changes were overwritten and discarded.
2020-08-18 (all times UTC)
- 00:17 Final changes made to map triple data by Martin.
- 00:23 JJ checks and confirms the content.
- 08:16 Timestamp of last edit from Y.
- 09:03 Martin posts that the map has lost the changes.
- 09:21 JJ emails Y.
- 09:43 Investigation finished and backup restored by Martin.
- 13:35 Y notifies us that the map was fine during presentation.
Following an incident of similar nature during early development, James had implemented a scheme that would generate, store, and verify uuids for every edit to prevent data loss due to concurrent updates.
When Martin joined the company he reviewed the code in question, observed that it did block concurrent updates in basic testing, and determined that the current storage solution was passable until implementing proper version history was scheduled.
After investigation, it was still not clear the exact circumstances that lead to to the data being overwritten. However, a diff of the changes showed that Y had relocated two painpoints as he'd intended, but that edit happened in the morning, and reverted the changes from the night before. This could happen if the version of the map data was from the previous day, but the update should have been rejected as the uuid would not match. From inspecting the backups, the uuid has been changing, which probably discounts a lack of entropy issue with generating the random ids? The logging in the system is not comprehensive enough to track the detail of updates after the fact.
Having a repeatably process for partly automated updates to the data, and local backups of each stage, meant recovery and diagnosis was relatively easy. However this depended on data some of which only resides on Martin's laptop.
A more robust means of updating and keeping a history should perhaps move up the priority list as we do not have confidence this could not happen again, in a more harmful manner.
Checking a map is a manual process at the moment, and we have no system for catching regressions.