Summarising a Day of Weather
Summarising an hourly forecast
How can we merge a number of hourly weather forecasts into a single coherent text summary? First we can identify the general weather type by selecting the most frequently occurring. If there's more than one, we'll err on the side of caution and return the more impactful condition.
“It will be partly cloudy”
“It will be raining heavily”
We also then need to identify the worst (most impactful) weather type expected - people will want to know if it’s going to rain, right? If the most frequent type is not also the worst type, and the worst type means there is going to be some rain, we need to tell people.
“It will be partly cloudy with patchy light rain”
“It will be overcast with heavy rain”
Everyone knows the forecast is fallible. If we want to decide whether or not to have a BBQ, it's important to know the uncertainty in the forecast - how likely is it to rain? To do this this we take the percentage ‘probability of precipitation’ values and convert them into a textual description, using research on people's perceptions of probabilities to decide on the boundaries. This gives users an indication of how confident we are of it being wet or dry.
“It will be partly cloudy with heavy rain highly likely”
“It will be sunny and is likely to remain dry”
Contextualising with time
When we talk about the weather it’s commonplace to split the day up into sections; morning, afternoon, evening and night. Now, whilst it’s clear there's an order to these periods, what may not be so obvious is how long each period lasts and where the boundaries between them lie. Some are obvious, for example it's widely accepted that the boundary between morning and afternoon is 12 O’Clock noon (where AM switches to PM). Now consider the boundary between night and morning. Frequently people refer to times like 3AM as ‘the middle of the night’.
For simplicity we have decided to split the day into four equal parts of six hours; morning 6AM - 12PM, afternoon 12PM - 6PM, evening 6PM - 12AM, and night 12AM - 6AM. This may initially seem odd, as the phrase 'tonight' now refers to tomorrow morning's weather. However, as we are looking to keep it simple for now we hope that these boundaries and periods become more apparent through user testing.
Putting these summaries together with our time period logic, we can now generate a summary for each of our periods and glue those together to describe the day:
“This morning it will be sunny and is highly likely to remain dry. This afternoon it will be sunny and is likely to remain dry. This evening it will be partly cloudy with light rain likely. Tonight it will be clear and is likely to remain dry.”
This kind of robotic response doesn't seem a very natural way of describing the weather. In this example the weather for today will be generally clear, turning a little worse in the evening. In order to better highlight these changes in the weather we decided to identify and merge similar forecasts into a single sentence.
Our challenge was to identify if multiple periods are similar enough to be combined without losing information. First we compared each forecast period to the next using the Sørensen–Dice coefficient, which gives a score between zero and one showing how similar two periods are. If that similarity score is above a certain threshold we can take original data for both these forecast periods and generate a new summary covering the entire merged period.
Applying this technique to our original summary from above and we find that we can merge the morning and afternoon forecasts:
“This morning and this afternoon it will be sunny and is likely to remain dry. This evening it will be partly cloudy with light rain likely. Tonight it will be clear and is likely to remain dry.”
The statement is still a little formulaic, but this method gives us a relatively simple way to generate a suitably concise forecast for the day's weather.