Skip to content
Steve edited this page Nov 2, 2016 · 21 revisions

We get Wikidata from the Wikidata API via wbgetentities. Our props are all the Wikidata properties that we listen for (via _WIKIPROPS) when we call get_wikidata() for an entity. Some of these properties will have a value that we can use immediately. Some of them have claims ("Q values" or "items") which must be parsed with another API call (get_claims()).

We call them "claims" because that is how properties are presented by the Wikidata API (entities[<item>][claims][<property>]). We define a selection of properties because the list of all properties is enormous, and growing!

Let's look at an example:

>>> art = wptools.page('Art Blakey').get_wikidata()
www.wikidata.org (wikidata) Art_Blakey
www.wikidata.org (wikidata) Q8341|Q30|Q5|Q9048913
en.wikipedia.org (imageinfo) File:Art Blakey08.JPG
Art_Blakey (en)
{
  cache: <dict(3)> {claims, imageinfo, wikidata}
  claims: <dict(5)> {Q1342, Q30, Q5, Q8341, Q9048913}
  description: American jazz drummer and bandleader
  images: <list(1)>
  label: Art Blakey
  lang: en
  modified: 2016-10-26T02:12:30Z
  props: <dict(10)> {P136, P18, P19, P27, P31, P345, P569, P570, P85...
  title: Art_Blakey
  wikibase: Q311715
  wikidata: <dict(10)> {IMDB, birth, birthplace, category, citizensh...
  wikidata_url: https://www.wikidata.org/wiki/Q311715
}

Here are the properties and values we found for Art Blakey with get_wikidata():

>>> art.props
{u'P136': [u'Q8341'],
 u'P18': [u'Art Blakey08.JPG'],
 u'P27': [u'Q30'],
 u'P31': [u'Q5'],
 u'P345': [u'nm0086845'],
 u'P569': [u'+1919-10-11T00:00:00Z'],
 u'P570': [u'+1990-10-16T00:00:00Z'],
 u'P856': [u'http://www.artblakey.com'],
 u'P910': [u'Q9048913']}

And here are the properties and labels we listened for:

>>> sorted([{x:art._WIKIPROPS[x]} for x in art._WIKIPROPS if x in art.props])
[{'P136': 'genre'},
 {'P18': 'image'},
 {'P27': 'citizenship'},
 {'P31': 'instance'},
 {'P345': 'IMDB'},
 {'P569': 'birth'},
 {'P570': 'death'},
 {'P856': 'website'},
 {'P910': 'category'}]

Some property values are useful right away:

P856: website = http://www.artblakey.com

So that gets put in wikidata with the meaningful label we defined in _WIKIPROPS:

>>> art.wikidata['website']
u'http://www.artblakey.com'

But other property values are "claims" that need to be resolved:

P136: genre = Q8341

Property values that start with "Q" get put into claims. For example, this says that art has the item Q8341 (jazz) for his genre:

>>> art.claims
{u'Q30': 'citizenship',
 u'Q5': 'instance',
 u'Q8341': 'genre',
 u'Q9048913': 'category'}

When we get unresolved claims, the tool will call get_claims() (from above):

www.wikidata.org (wikidata) Q8341|Q30|Q5|Q9048913

You can find the claims query in the cache attribute. We reuse the action=wbgetentities query with no title, and "Q values" or items for the ids parameter:

>>> art.cache['claims']['query']
u'https://www.wikidata.org/w/api.php?action=wbgetentities&formatversion=2&ids=Q8341|Q30|Q5|Q9048913&languages=en&props=labels&redirects=yes&sites=&titles='

We update wikidata with the fully determined value for each claim:

>>> art.wikidata
{'IMDB': u'nm0086845',
 'birth': u'+1919-10-11T00:00:00Z',
 'category': None,
 'citizenship': u'United States of America',
 'death': u'+1990-10-16T00:00:00Z',
 'genre': u'jazz',
 'image': u'Art Blakey08.JPG',
 'instance': u'human',
 'website': u'http://www.artblakey.com'}

Our get_wikidata() query returned many properties we did not resolve:

>>> len(art.wikidata)
9

>>> j = json.loads(art.cache['wikidata']['response'])
>>> len((j['entities']['Q311715']['claims']))
46

That is expected because we do not listen for all possible properties (as mentioned above).

You can listen for additional Wikidata properties by extending _WIKIPROPS:

>>> art = wptools.page('Art Blakey')

>>> art._WIKIPROPS['P19'] = 'birthplace'

>>> art.get_wikidata()
www.wikidata.org (wikidata) Art_Blakey
www.wikidata.org (wikidata) Q8341|Q30|Q5|Q1342|Q9048913
en.wikipedia.org (imageinfo) File:Art Blakey08.JPG
Art_Blakey (en)
{
  claims: <dict(5)> {Q1342, Q30, Q5, Q8341, Q9048913}
  props: <dict(10)> {P136, P18, P19, P27, P31, P345, P569, P570, P85...
  wikidata: <dict(10)> {IMDB, birth, birthplace, category, citizensh...
  wikidata_url: https://www.wikidata.org/wiki/Q311715
  ...
}

>>> art.wikidata['birthplace']
u'Pittsburgh'

Now we know that Art Blakey's birthplace ("Pittsburgh") is Wikidata item Q1342, but we only needed to ask for property P19 ("place of birth").

Further reading

Properties

Clone this wiki locally