Visualising a volcanic sulphur dioxide cloud
Penny C. Dinh & Liz Hourahine
It’s all well and good conducting in depth research and running advanced simulations. However, from the perspective of a scientifically naive member of the public, it can be difficult to envision the relevance of abstract material. Although of course content should be the most valuable part of any scientific endeavour, it can be useful to visualise models in an appealing way to maximise general interest.
We were asked to make a video visualisation of a simulation of sulphur dioxide output from the Nabro volcano eruption in Eritrea, East Africa in 2011. The data consisted of 1000 NetCDF files, simulated for every half an hour over a month long period.
Here’s a quick summary of what we did and the tools we used to do them:
The first step was to get this data into a usable format. We did this using the python library Iris, through which we created multidimensional cubes of data. Each cube contained latitude, longitude, and time, with metadata about flight levels, among other attributes. Each file was loaded into a cube list of 7 iris cubes, each containing data from different ranges of flight level. This cubelist was then merged into one cube with flight levels as a new dimension. Data was masked so that 0 values would not be displayed on the plot. We also converted the unit of the data from grams/metre3 to microgram/metre3 as the latter is more suitable in a scientific context.
In order to plot these new 4 dimensional cubes (latitude, longitude, time, and height) in 2 dimensional space, we needed to reduce the number of dimensions to only longitude and latitude. As the time attribute was the same for all 7 layers, we could easily eliminate this dimension. However, the flight level dimension differs between the 7 layers, and so we collapsed along this dimension by using iris.analysis.SUM, which takes the sum of all height layers for each coordinate. The resulting cube consisted of 2 dimensions: longitude and latitude.
Matplotlib and Cartopy
Using Matplotlib to define a plot and axes, we then used Cartopy to import GeoTiles. GeoTiles are map image tiles based on GeoAxes, with map coordinates corresponding to the relevant points on the image.
When researching how to plot the data along these axes, we came across two methods, pcolormesh and contourf. The main differences between these two methods are presented below:
To plot the data, we use the matplotlib pcolormesh tool within iris.plot, which colours the map basing on different data levels. This was more appropriate than contourf for this level of data, as hard boundaries looked fairly ‘blocky’ with the level of horizontal resolution. In addition, hard contour lines drawn in contourf would not allow us to present the flow of gas as naturally as with pcolormesh.
Colorbar and colour map
Data was plotted across the GeoAxes and overlaid on the map tiles, using the built-in matplotlib colour map ‘magma’. Something else we found during this process is that there’s a lot more to choosing colour maps than would first seem.
There are different types of colour scale for displaying different types of data (https://visual.ly/blog/subtleties-of-color-different-types-of-data-require-different-color-schemes/). We chose a perceptually uniform colour map, meaning that a logical progression from the low to high values was shown by increasing brightness, making the representation clear. As data points became of higher value the colour displayed would become brighter and more visually interesting.
Rescaling the colour bar
In terms of distribution our data was very skewed, most of our data points being low, with few high values. In order to illustrate the variation of our data more clearly we found that a scale of logarithm to base 10 would be more suitable than a linear scale, as it would allow us to see more variation at the lower end of the data. We did not transform the data itself but rather the way data was displayed onto the colour map.
We wrote a function that would allow us to produce a plot for each time step. However, there were 1000 time steps, and so iterating over 1000 NetCDF files would have taken around 8 hours. We decided to use Dask, a parallel processing tool. Dask mapped the function onto the files given to it, across multiple workers on numerous cores. This reduced the processing time to around 15 minutes.
Making the video
We zipped the 1000 plots together into a video using ffmpeg. Ffmpeg is a media tool that can be used for manipulating videos and stills; in this context for rendering 1000 png files into a video. It’s written for use within a Bash/Shell command environment. However, in order to use it within a library we had to use it with pure python. The python module Subprocess was used to spawn new processes and connect to input/output/error pipes; we used this module to call on ffmpeg commands.
This is the final product:
The code used to make this video has been wrapped up into a python library, which you can find here (repo link). This can create much more appealing visualisations than numbers on a page, and could potentially create more interest in the research it’s used for. Feel free to take a look!