Skip to content
Steve edited this page Aug 14, 2017 · 21 revisions

We get Wikidata from the Wikidata API via wbgetentities. Our props are all the Wikidata properties that we listen for (via _WIKIPROPS) when we call get_wikidata() for an entity. Some of these properties will have a value that we can use immediately. Some of them have claims ("Q values" or "items") which must be parsed with another API call (get_claims()).

We call them "claims" because that is how properties are presented by the Wikidata API (entities[<item>][claims][<property>]). We define a selection of properties to capture because the list of all properties is enormous, and growing!

Let's look at an example:

>>> art = wptools.page('Art Blakey').get_wikidata()
www.wikidata.org (wikidata) Art_Blakey
www.wikidata.org (claims) Q8341|Q30|Q5|Q9048913
en.wikipedia.org (imageinfo) File:Art Blakey08.JPG
Art_Blakey (en)
{
  cache: <dict(3)> {claims, imageinfo, wikidata}
  claims: <dict(4)> {Q30, Q5, Q8341, Q9048913}
  description: American jazz drummer and bandleader
  images: <list(1)>
  label: Art Blakey
  lang: en
  modified: <dict(1)> {wikidata}
  props: <dict(9)> {P136, P18, P27, P31, P345, P569, P570, P856, P91...
  title: Art_Blakey
  what: human
  wikibase: Q311715
  wikidata: <dict(9)> {IMDB, birth, category, citizenship, death, ge...
  wikidata_url: https://www.wikidata.org/wiki/Q311715
}

Here are the properties and values we found for Art Blakey with get_wikidata():

>>> art.props
{u'P136': [u'Q8341'],
 u'P18': [u'Art Blakey08.JPG'],
 u'P27': [u'Q30'],
 u'P31': [u'Q5'],
 u'P345': [u'nm0086845'],
 u'P569': [u'+1919-10-11T00:00:00Z'],
 u'P570': [u'+1990-10-16T00:00:00Z'],
 u'P856': [u'http://www.artblakey.com'],
 u'P910': [u'Q9048913']}

And here are the properties and labels we listened for:

>>> sorted([{x:art._WIKIPROPS[x]} for x in art._WIKIPROPS if x in art.props])
[{'P136': 'genre'},
 {'P18': 'image'},
 {'P27': 'citizenship'},
 {'P31': 'instance'},
 {'P345': 'IMDB'},
 {'P569': 'birth'},
 {'P570': 'death'},
 {'P856': 'website'},
 {'P910': 'category'}]

Some property values are useful right away:

P856: website = http://www.artblakey.com

So that gets put in wikidata with the meaningful label we defined in _WIKIPROPS:

>>> art.wikidata['website']
u'http://www.artblakey.com'

But other property values are "claims" that need to be resolved:

P136: genre = Q8341

Property values that start with "Q" get put into claims:

>>> art.claims
{u'Q30': 'citizenship',
 u'Q5': 'instance',
 u'Q8341': 'genre',
 u'Q9048913': 'category'}

That says that art has the Wikidata item Q8341 (jazz) for his genre.

When we get unresolved claims, the tool will call get_claims() (from above):

www.wikidata.org (claims) Q8341|Q30|Q5|Q9048913

You can find the claims query in the cache attribute:

>>> art.cache['claims']['query']
u'https://www.wikidata.org/w/api.php?action=wbgetentities&formatversion=2&ids=Q8341|Q30|Q5|Q9048913&languages=en&props=labels&redirects=yes&sites=&titles='

We reuse the action=wbgetentities query with no title, and "Q values" or items for the ids parameter.

We then update the wikidata attribute with the fully determined value for each claim:

>>> art.wikidata
{'IMDB': u'nm0086845',
 'birth': u'+1919-10-11T00:00:00Z',
 'category': None,
 'citizenship': u'United States of America',
 'death': u'+1990-10-16T00:00:00Z',
 'genre': u'jazz',
 'image': u'Art Blakey08.JPG',
 'instance': u'human',
 'website': u'http://www.artblakey.com'}

Our get_wikidata() query returned many properties which we did not resolve:

>>> len(art.wikidata)
9

>>> j = json.loads(art.cache['wikidata']['response'])
>>> len((j['entities']['Q311715']['claims']))
46

That is expected because we do not listen for all possible properties, as mentioned above.

You can listen for additional Wikidata properties by extending _WIKIPROPS:

>>> art = wptools.page('Art Blakey', props={'P19': 'birthplace'})

>>> art.get_wikidata()
www.wikidata.org (wikidata) Art_Blakey
www.wikidata.org (claims) Q8341|Q30|Q5|Q1342|Q9048913
en.wikipedia.org (imageinfo) File:Art Blakey08.JPG
Art_Blakey (en)
{
  claims: <dict(5)> {Q1342, Q30, Q5, Q8341, Q9048913}
  props: <dict(10)> {P136, P18, P19, P27, P31, P345, P569, P570, P85...
  wikidata: <dict(10)> {IMDB, birth, birthplace, category, citizensh...
  wikidata_url: https://www.wikidata.org/wiki/Q311715
  ...
}

>>> art.wikidata['birthplace']
u'Pittsburgh'

Now we know that Art Blakey's birthplace ("Pittsburgh") is Wikidata item Q1342, but we only needed to ask for property P19 ("place of birth") and we assigned that property a convenient label, birthplace.

Further reading

Properties

Clone this wiki locally