Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikipedia links in Person.csv #488

Open
whalekeykeeper opened this issue Dec 1, 2021 · 1 comment
Open

Wikipedia links in Person.csv #488

whalekeykeeper opened this issue Dec 1, 2021 · 1 comment
Milestone

Comments

@whalekeykeeper
Copy link
Contributor

whalekeykeeper commented Dec 1, 2021

Under the column of source_1 and source_2 of Person.csv, sometimes a wikipedia link is given.

In the Wikidatalookup tool repository, authenticity_person.py contains functions to retrieve the Q-identifier of a Wikidata entity by querying with the MediaWiki API with these Wikipedia links, then via using the Wikidata SPARQL endpoint, information (such as name, gender, birth year, death year, place of birth) are queried and stored.

During the above procedure, it is found that:

  1. Some Wikipedia links do not point to the page we need, the page itself is valid but the Wikipedia link is considered to be invalid for not serving the goal of storing it
    For example, for AG0141, 'https://zh.wikipedia.org/wiki/%E9%BB%8E%E6%BE%8D' points to a disambiguation page

  2. Some Wikipedia links are not valid
    For example, for AG0328 , https://en.wikipedia.org/wiki/Sergei_yesenin will return a page with Wikipedia does not have an article with this exact name

A list of invalid Wikipedia is under compiling.

@whalekeykeeper whalekeykeeper self-assigned this Dec 1, 2021
@whalekeykeeper
Copy link
Contributor Author

There are also implicit invalid Wikipedia links.
See issue 23 in repository Wikidatalookup

@whalekeykeeper whalekeykeeper removed their assignment Mar 31, 2022
@duncdrum duncdrum added this to the 2.x milestone Oct 10, 2022
@duncdrum duncdrum moved this to In Progress in ReadAct Development Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants