Finding famous MPs based on their Wikipedia Page Views
As part of the Graphing Brexit series of blog posts, I wanted to work out who were the most important Members of the UK parliament, and after a bit of Googling I realised that views of their Wikipedia pages would do the trick.
I initially found my way to tools.wmflabs.org, which is great for exploring the popularity of an individual MP, but not so good if you want to extract the data for 600 of them.
I then came to learn that Wikimedia have a REST API and, hidden at the bottom of a blog post from 2015, a Python library called myviews. Yay!
It’s really easy to use as well. Installation is via PyPi:
pip install mwviews
And then if we want to find the page views from the last week for our current Prime Minister, Theresa May, we can write the following code:
from mwviews.api import PageviewsClient
p = PageviewsClient("mark-needham")
views = p.article_views("en.wikipedia", ["Theresa May"], start="20190324", end="20190331")
And now let’s iterate over views
to find the number of pageviews per day:
for day in views:
print(day, views[day])
2019-03-24 00:00:00 {'Theresa_May': 23461}
2019-03-25 00:00:00 {'Theresa_May': 22272}
2019-03-26 00:00:00 {'Theresa_May': 18661}
2019-03-27 00:00:00 {'Theresa_May': 42541}
2019-03-28 00:00:00 {'Theresa_May': 34310}
2019-03-29 00:00:00 {'Theresa_May': 34514}
2019-03-30 00:00:00 {'Theresa_May': 20604}
2019-03-31 00:00:00 {'Theresa_May': 18137}
We can extend our example to compute pageviews for multiple people by adding their names to the array, and we’ll also extend the date range back to the EU referendum of 2016:
people = [
"Boris Johnson", "Theresa May", "Jacob Rees-Mogg", "Jeremy Corbyn"
]
views = p.article_views("en.wikipedia", people, start="20160624", end="20190331")
That’s a lot of days so, rather than printing out each day on its own, let’s sum up the pageviews:
votes = {person: 0 for person in people }
for key in views.keys():
for person_key in views[key].keys():
person = person_key.replace("_", " ")
if views[key][person_key]:
votes[person] += views[key][person_key]
And who’s the most famous of them all?
max_width = max([len(key) for key in votes.keys()])
for person in votes:
print(f"{person:<{max_width}} {votes[person]:,}")
Boris Johnson 5,727,213
Theresa May 12,844,215
Jacob Rees-Mogg 3,631,652
Jeremy Corbyn 5,965,669
About the author
I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.