Albin Larsson: Blog

Culture, Climate, and Code

Writing Structured Data on Commons with Python

15th September 2020

Pywikibot does not yet have built-in support for writing Structured Data to Wikimedia Commons so to do so currently one needs to do it by posting JSON data to the Wikimedia Commons Wikibase API, this blog post will walk you through how to make the requests needed and how to structure the JSON to get it all working.

The minimal example presented here will check if the given file has a statement claiming that it depicts a hat and if not write such a statement.

First of you will need to have Pywikibot installed and all god to go, the following imports and code should run without error.

import json

import pywikibot

site = pywikibot.Site('commons', 'commons')
site.login()
site.get_tokens('csrf') # preload csrf token

Next up let’s turn a pagename/filename into a MID, think of a MID as Wikidata’s QID but for Wikimedia Commons. The MID happens to correspond to Mediawiki’s “pageid”.

page = pywikibot.Page(site, title='Konst och Nyhetsmagasin för medborgare af alla klasser 1818, illustration nr 44.jpg', ns=6)

media_identifier = 'M{}'.format(page.pageid)

Next up, we need to fetch all existing structured data so that we can check what statements already exist. Here is the first example where we need to use Pywikibot’s internal API wrapper “_simple_request” to call the Wikibase API, you could do the same with a regular HTTP library such as requests.

request = site._simple_request(action='wbgetentities', ids=media_identifier)
raw = request.submit()
existing_data = None
if raw.get('entities').get(media_identifier).get('pageid'):
  existing_data = raw.get('entities').get(media_identifier)

Next, let us check if depicts (P180) got a statement with the value Q80151 (hat), if so exit the program.

depicts = existing_data.get('statements').get('P180')
# Q80151 (hat)
if any(statement['mainsnak']['datavalue']['value']['id'] == 'Q80151' for statement in depicts):
  print('There already exists a statement claiming that this media depicts a hat.')
  exit()

Now we need to create the JSON defining such a claim, it’s verbose, to say the least. You can add more claims by appending more objects to the “claims” array. To get an idea of what these JSON structures can look like you can add structured data using the Wikimedia Commons GUI and then look at the resulting JSON by appending “.json” to the media’s URI. It might be particularly interesting to try out qualifiers and references.

statement_json = {'claims': [{
  'mainsnak': {
    'snaktype':'value',
    'property': 'P180',
    'datavalue': {
      'type' : 'wikibase-entityid',
      'value': {
        'numeric-id': '80151',
        'id' : 'Q80151',
      },
    },
  },
  'type': 'statement',
  'rank': 'normal',
}]}

Now, all we need to do is to send this data to the Wikibase API together with some additional information such as a CSRF token the media identifier, etc.

csrf_token = site.tokens['csrf']
payload = {
  'action' : 'wbeditentity',
  'format' : u'json',
  'id' : media_identifier,
  'data' : json.dumps(statement_json, separators=(',', ':')),
  'token' : csrf_token,
  'summary' : 'adding depicts statement',
  'bot' : True, # in case you're using a bot account (which you should)
}

request = site._simple_request(**payload)
try:
  request.submit()
except pywikibot.data.api.APIError as e:
  print('Got an error from the API, the following request were made:')
  print(request)
  print('Error: {}'.format(e))

That should be it, you can now use this example to create your own wrapper around this functionality to make it usable in batch operations.

In case you want to write SDC with the mwoauth/mwapi libraries instead of Pywikibot you can look at this Flask application built for the Roundtripping project to get a hint.

Related posts