-
Notifications
You must be signed in to change notification settings - Fork 78
Wikidata
We get Wikidata from the Wikidata API via wbgetentities. Our props
are all the Wikidata properties that we listen for (via _WIKIPROPS
) when we call get_wikidata()
for an entity. Some of these properties will have a value that we can use immediately. Some of them have claims ("Q values" or "items") which must be parsed with another API call (get_claims()
).
We call them "claims" because that is how properties are presented by the Wikidata API (entities[<item>][claims][<property>]
). We define a selection of properties to capture because the list of all properties is enormous, and growing!
Let's look at an example:
>>> art = wptools.page('Art Blakey').get_wikidata()
www.wikidata.org (wikidata) Art_Blakey
www.wikidata.org (claims) Q8341|Q30|Q5|Q9048913
en.wikipedia.org (imageinfo) File:Art Blakey08.JPG
Art_Blakey (en)
{
cache: <dict(3)> {claims, imageinfo, wikidata}
claims: <dict(4)> {Q30, Q5, Q8341, Q9048913}
description: American jazz drummer and bandleader
images: <list(1)>
label: Art Blakey
lang: en
modified: <dict(1)> {wikidata}
props: <dict(9)> {P136, P18, P27, P31, P345, P569, P570, P856, P91...
title: Art_Blakey
what: human
wikibase: Q311715
wikidata: <dict(9)> {IMDB, birth, category, citizenship, death, ge...
wikidata_url: https://www.wikidata.org/wiki/Q311715
}
Here are the properties and values we found for Art Blakey with get_wikidata()
:
>>> art.props
{u'P136': [u'Q8341'],
u'P18': [u'Art Blakey08.JPG'],
u'P27': [u'Q30'],
u'P31': [u'Q5'],
u'P345': [u'nm0086845'],
u'P569': [u'+1919-10-11T00:00:00Z'],
u'P570': [u'+1990-10-16T00:00:00Z'],
u'P856': [u'http://www.artblakey.com'],
u'P910': [u'Q9048913']}
And here are the properties and labels we listened for:
>>> sorted([{x:art._WIKIPROPS[x]} for x in art._WIKIPROPS if x in art.props])
[{'P136': 'genre'},
{'P18': 'image'},
{'P27': 'citizenship'},
{'P31': 'instance'},
{'P345': 'IMDB'},
{'P569': 'birth'},
{'P570': 'death'},
{'P856': 'website'},
{'P910': 'category'}]
Some property values are useful right away:
P856: website = http://www.artblakey.com
So that gets put in wikidata
with the meaningful label we defined in _WIKIPROPS
:
>>> art.wikidata['website']
u'http://www.artblakey.com'
But other property values are "claims" that need to be resolved:
P136: genre = Q8341
Property values that start with "Q" get put into claims
:
>>> art.claims
{u'Q30': 'citizenship',
u'Q5': 'instance',
u'Q8341': 'genre',
u'Q9048913': 'category'}
That says that art
has the Wikidata item Q8341 (jazz) for his genre.
When we get unresolved claims, the tool will call get_claims()
(from above):
www.wikidata.org (claims) Q8341|Q30|Q5|Q9048913
You can find the claims query in the cache
attribute:
>>> art.cache['claims']['query']
u'https://www.wikidata.org/w/api.php?action=wbgetentities&formatversion=2&ids=Q8341|Q30|Q5|Q9048913&languages=en&props=labels&redirects=yes&sites=&titles='
We reuse the action=wbgetentities
query with no title, and "Q values" or items for the ids
parameter.
We then update the wikidata
attribute with the fully determined value for each claim:
>>> art.wikidata
{'IMDB': u'nm0086845',
'birth': u'+1919-10-11T00:00:00Z',
'category': None,
'citizenship': u'United States of America',
'death': u'+1990-10-16T00:00:00Z',
'genre': u'jazz',
'image': u'Art Blakey08.JPG',
'instance': u'human',
'website': u'http://www.artblakey.com'}
Our get_wikidata()
query returned many properties which we did not resolve:
>>> len(art.wikidata)
9
>>> j = json.loads(art.cache['wikidata']['response'])
>>> len((j['entities']['Q311715']['claims']))
46
That is expected because we do not listen for all possible properties, as mentioned above.
You can listen for additional Wikidata properties by extending _WIKIPROPS
:
>>> art = wptools.page('Art Blakey', props={'P19': 'birthplace'})
>>> art.get_wikidata()
www.wikidata.org (wikidata) Art_Blakey
www.wikidata.org (claims) Q8341|Q30|Q5|Q1342|Q9048913
en.wikipedia.org (imageinfo) File:Art Blakey08.JPG
Art_Blakey (en)
{
claims: <dict(5)> {Q1342, Q30, Q5, Q8341, Q9048913}
props: <dict(10)> {P136, P18, P19, P27, P31, P345, P569, P570, P85...
wikidata: <dict(10)> {IMDB, birth, birthplace, category, citizensh...
wikidata_url: https://www.wikidata.org/wiki/Q311715
...
}
>>> art.wikidata['birthplace']
u'Pittsburgh'
Now we know that Art Blakey's birthplace ("Pittsburgh") is Wikidata item Q1342, but we only needed to ask for property P19 ("place of birth") and we assigned
that property a convenient label, birthplace
.
- https://www.mediawiki.org/wiki/Wikibase/DataModel#Overview_of_the_data_model
- https://www.wikidata.org/wiki/Help:Wikidata_datamodel
- https://www.mediawiki.org/wiki/Wikibase/API#wbgetentities
- https://www.mediawiki.org/wiki/API:Presenting_Wikidata_knowledge
Properties
- https://www.wikidata.org/wiki/Wikidata:List_of_properties/all
- https://www.wikidata.org/wiki/Wikidata:List_of_properties/Generic
- https://www.wikidata.org/wiki/Wikidata:List_of_properties/Person
- https://www.wikidata.org/wiki/Wikidata:WikiProject_Taxonomy#Properties_each_item_.28that_deals_with_a_taxon.29_should_have