Introducing forest

By Stephen Haddad


Our aim

From October 2017 to April 2018, Graeme Anderson and and I have been on secondment to Science Partnership, working in conjunction with the Informatics Lab to create a new visualisation tool focused at first on visualising the output from high resolution, convection-permitting forecast simulations over SE Asia together with relevant observations.  The models are being run for the Weather and Climate Science for Services Partnership (WCSSP) South-east Asia, which is a partnership between the Met Office and the National Met Services of Indonesia (BMKG), Malaysia (MMD), and the Philippines (PAGASA), as well as UK universities. The visualisation tool is being developed as part of a work package which is focusing on the pull-through from scientific research to operational meteorology. The new tool will help operational meteorologists better understand the output of our forecast models, which make use of the latest scientific research into understanding the tropical atmosphere, and thus better identify the location, likelihood and severity of high impact weather events in the region.

The problem with visualisation tools

The first reaction to being introduced to a new visualisation tool by the Met Office might be, why another tool? If if there are n scientists at the Met Office, there probably 2n tools, websites, apps and libraries for often quite similar visualisation tasks that already exist! So there needs to be good reason for this development effort to be a worthwhile investment. The key aims of the tool we wanted to develop included unifying data sources, portability, extendibility, and efficiency.

The first goal of unifying data sources came out of an initial discussion with meteorologists in Global Guidance Unit, who were asked to assess the performance of the regional SE Asia models. They found it cumbersome to compare model data with observations such as satellite imagery or precipitation measurements, as each was presented in a separate tool in a different format. This preponderance of tools made comparison difficult and time-consuming. It was quickly apparent that meteorologists needed a tool that reduced the cognitive overhead of comparing simulation and observational data, so they can focus their mental effort on diagnosing model errors or identifying high impact weather.

One of the problems with a lot of the tools that exist is they were developed as a temporary solution to address a specific need. Although they do very well at addressing that need, the underlying source is often not easy to adapt to similar problems, such as a different data set, different model fields, running on a different platform, or running on a different site. We wanted to make sure that although the tool would initially be used for visualising specific datasets for SE Asia, we should easily be able to apply it to different regions or data fields. Examples include output from a similar high-resolution model for tropical Africa which is already running, or incorporating data from other non-Met Office models used by met service partners in SE Asia.

What we hope to achieve

We also wanted portability for where the source code for the application can be run. Our initial tool will run on a server deployed to Amazon Web Services, but the tool would also need to run locally for development purposes, and might be deployed in future on a local Met Office server, or a server run by a partner institution. We have achieved this by using standard tools that are available both inside the Met Office and easily installed on other platforms. The source code is written in the widely used python programming language. The only dependencies of the source code are Iris libraries for data handling, along with Matplotlib, Cartopy and Bokeh for plotting and creating the web pages. It is quick and easy for anyone to download the source code, install the libraries and have a Forest instance  running on a local machine or server.

An additional advantage of these choices is that they are widely used and understood in the scientific community. The project and source code is hosted on GitHub, so anyone can use and contribute to the code. This should enable scientists both in the Met Office and throughout the global science community to contribute new features or adapt the tool to new applications without having to start from scratch. This will hopefully lead to one sort of efficiency, where scientists only need to expend effort developing the aspects of the the tool they need that are new or different to existing tools, rather than starting from scratch with each new project.

Another sort of efficiency is to reduce wasted computation and storage by creating a dynamic analysis tool rather than batch analysis. Many existing tools will create thousands or even millions of plots in batch ahead of time, based on all or as many as possible of the combinations of fields, data sources and view configurations available, even though only a small percentage of the output will ever be used. This was necessary when the computing power available was not sufficient to produce a requested plot in a reasonable amount of time for an interactive tool. With the advent of cloud computing services, it is easy and inexpensive to use a large amount of computing power for a brief period, so that derived fields can be quickly calculated and plotted on demand through parallel processing.

The challenges in implementing a tool with these lofty goals have been investigated in recent years by the Informatics Lab. They have created prototype tools that demonstrate possible solutions in areas such as analysing and visualising large datasets, deploying scientific tools on cloud platforms, and streamlining the research to publishing on operations pipeline. Science partnerships decided to work with the lab so that development of the new tool proceeded not from the starting point of “how can we meet the requirements with what we already know how to do”, but rather “what is the best solution that meets short and long term goals”.

The lab’s knowledge and experience have been invaluable in opening our eyes to what is possible with today's technology (and also what might be with tomorrow's technology!), guiding us to the solutions that best fit our needs and helping us to utilise the tools and technology for our application. We have also been able to deploy our initial version of the tool using the lab’s infrastructure, so that in a relatively short period we have a tool that is almost ready to be used by operational and research meteorologists in the UK and SE Asia.

In the following blog posts, I will be looking in more details at two aspects of forest. Firstly, the challenge of moving data to cloud storage so the compute capacity of cloud providers can be exploited. Secondly I will describe the way we plot and display the data using a hybrid of Matplotlib. The last post will look at how the lesson we’ve learnt developing forest can feed in to future development of this and other tools and infrastructure.