Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Add option to keep oldest triples when dropping duplicates #138

Merged
merged 4 commits into from
Mar 6, 2024

Conversation

alightwing
Copy link
Contributor

We typically create our models over two passes, once to create a model from Excel data, and then a second one to add in labels triples. Since we currently drop the oldest triple where duplicates are found, this means that the labels triples always "win" - however, we usually treat our Excel data as a source of truth so it's these older triples that should be retained.

So, add option to select behaviour when dropping duplicates. Default is the existing keep-newest behaviour as this is appropriate when building the Excel model itself, but allow keep-oldest instead at the user's discretion.

Copy link
Member

@bz2 bz2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This seems fine? I think the least clear part is how safe the newest/oldest assumption with the triple store order is - it may not match intuition if the input arguments and processing steps do not align.

@alightwing alightwing force-pushed the drop-duplicates-options branch from 5ec9daf to cafa0f2 Compare March 6, 2024 11:30
@alightwing alightwing merged commit 65e92cf into VisualMeaning:master Mar 6, 2024
2 checks passed
@alightwing alightwing deleted the drop-duplicates-options branch March 6, 2024 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants