-
Notifications
You must be signed in to change notification settings - Fork 52
BA: missing person gender #817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
As explained in #815, there is now a procedure in place to add missing gender. But before doing that, I noticed that some forename and surenames are mixed up in BA, I just tested forenames ending in '-ić' and corrected those. I'm sure there are more that I haven't noticed. As a side-effed of this fixed, we have now doubled persons - once, where the order is (was) correct, and one where it was corrected - but even though they now have the correct names, they still have different person IDs (e.g. BoškoŠiljeković vs. ŠiljekovićBoško). Hopefully this will be fixed in the future.
|
Thanks for this as well. The different IDs are due to you exchanging the surname and forename, but not changing the ID? Would that not be simple to resolve once we know what is the correct name and what the correct surname? We did not do any work on the Bosnian data ourselves, but have obtained them from our upstream source, so are very unknowledgeable on what issues the data might have. |
Yes, exactly.
Well, it is simple in that it is clear what needs to be done - but you need to go through all the files and replace, so with some testing that nothing is messed up it might take a while. More than I would gladly invest, esp. as these mistakes cropped up by chance, who knows how many others are lurking in there... However, I will re-open this issue, maybe somebody finds the time. Just in case it would be @nljubesi , let me know beforehand, as the source data has now been fixed and you need to get that copy. |
Sex has been added to BA, as regards wrong forename/surname distinction, this is now discussed in #852. |
The current BA corpus has of persons without the
<sex>
element, even though it is is easy to determine based on the person's forename.The text was updated successfully, but these errors were encountered: