Python Compare Wikipedia Pages

By | December 27, 2020

Wikimedia has an API which lets you compare Wikipedia pages, and in some cases modify pages and information within the Wikimedia group. The main page for all Wikimedia API information is here:

In this post I am most interested in the Wikipedia compare API, to show how you use it to see differences between versions of a Wikipedia page. The documentation for this part of the Wikipedia API is here:

Compare Wikipedia Pages

Using the example on the Wikipedia page it is fairly easy to use the API to return the diff for two different pages. In this case for gold and silver.

import requests
session = requests.Session()

api_url = "https://en.wikipedia.org/w/api.php"
PARAMS = {
    'action': "compare",
    'format': "json",
    'fromtitle': 'Gold',
    'totitle': 'Silver'
}
response = session.get(url=api_url, params=PARAMS)
response_json = response.json()
for key, value in response_json['compare'].items():
    print(key, ' : ', value)

This code will capture and print the whole of the API response json. The diff of the pages is stored in the ‘*’ key of the ‘compare’ dictionary.

Comparing Page Versions

The API also makes it possible to compare between different revisions of the same (or different) pages. The use of the API is similar to comparing separate Wikipedia pages, only we specify individual revisions (which are unique ids across different pages) rather than page names.

An example of the comparison viewed via the webpage is here.

import requests
session = requests.Session()
api_url = "https://en.wikipedia.org/w/api.php"

PARAMS = {
    'action': "compare",
    'format': "json",
    'fromrev': 989715533,
    'torev': 989941568
}
response = session.get(url=api_url, params=PARAMS)
response_json = response.json()
for key, value in response_json['compare'].items():
    print(key, ' : ', value)