Plot the Publications in Your Thesis

By | October 20, 2015

How recent are the references in your thesis or dissertation?

Have you only cited papers from the last few years? Or have you gone back to find dozens of pre-war publications?

I thought it would be interesting to find out, so I made a python script to plot the publications in your thesis.

In my case the output came out looking like this:

Plot the Publications in Your ThesisHave a Go Yourself

If you would like to plot the publications in your thesis , please let me know – I’d be interested to see how it looks!

In principle this script should work on pretty much any latex document with references (or at least those that generate a .bbl file – see below). My initial thought was that this would be for long documents such as theses or dissertations, but it could also work for papers

The first step is to:

Download the Python Script

To run the RefCount.py script you will need these packages:

  • re
  • pandas
  • collections
  • matplotlib

I normally use Anaconda to help me manage my python packages

Search Through Your .bbl File

The script “RefCount.py” works by searching through the .bbl file generated when you build your latex document.

RefCount.py will need to be in the same folder as the .bbl file, so you could save RefCount.py into your latex build folder.

I would recommend making a copy of your .bbl file and putting it together with RefCount.py in a separate folder, e.g. “RefCount”.

Regular Expressions – some assembly required

Plot the Publications in Your Thesis

The actual searching through the .bbl file is done using a regular expression, which searches for matches to a particular string.

In RefCount.py the regular expression is looking for years, such as 1985, 2010, 1908, etc., within the .bbl file. Of course these can be found in different ways, such as at the end of a sentence e.g. ‘1985.’ or inside brackets ‘(1985)’.

For RefCount.py to work you will have to set the regular expresssion so that it can find the dates in your .bbl file.

Some Sample Regular Expressions

Regular expressions can be a bit tricky, so here are a couple of expressions that I’ve found to work on some common .bbl

4-digit numbers in parentheses e.g. ‘(1985)’.

"\(([0-9]{4})\)"

4-digit numbers (with or without a close-parenthesis), followed by a full stop, and end of line.

"([0-9]{4})\)*\.|$"

Hopefully these will be sufficient for most situations, but if you do need to build your own then pythex.org is a useful site for testing out regular expressions – just copy in the text from your .bbl file as “Your test string:”.

If you’re still having problems extracting the date from your .bbl file – leave a comment below and I’ll see if I can help you out.

Some Example Graphs

Here are some examples of the kind of output you can expect from RefCount.py.

Plot the Publications in Your Thesis Plot the Publications in Your Thesis Plot the Publications in Your Thesis