DRAFT: Add option to keep oldest triples when dropping duplicates #138
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We typically create our models over two passes, once to create a model from Excel data, and then a second one to add in labels triples. Since we currently drop the oldest triple where duplicates are found, this means that the labels triples always "win" - however, we usually treat our Excel data as a source of truth so it's these older triples that should be retained.
So, add option to select behaviour when dropping duplicates. Default is the existing
keep-newest
behaviour as this is appropriate when building the Excel model itself, but allowkeep-oldest
instead at the user's discretion.