Cards for the Met Office Alexa Skill

With new Amazon devices such as the Show and Spot, and Alexa beginning to be embedded in TV's, cars and other everyday devices, there are new opportunities to bring more value to skills through visual media such as images and written text. This week, we've been exploring adding these visuals to the UK Met Office Weather Skill. 

There are two ways of adding text and images to your skill cards and display directives. All Alexa devices already support 'home cards', which are displayed in the Amazon Alexa App and additionally on the devices screen. Cards are the simpler option and primarily just display a visual version of the ongoing interaction, so that's where we started.

Home Cards

Home cards are pretty easy to add, all you need to do is add a simple 'card' object to the response. Here's an example of our card:

Amazon then take your response and display it in an appropriate way depending on the specific device:

These home cards are a bit of a double edged sword. They're really simple to implement, and Amazon handle rendering the card across different devices. But there are so many differences that it's impossible to really make a single design look good in different contexts. Reducing the options feels as though means there is very little leeway to work around the challenges, and few options for communicating the necessary information.

Display directives

Cards don't produce a stunning visual experience on devices like the Show or Spot, so to create a more appealing display we need to use directives. Directives are instructions added to the response to tell the device to do things such as play a sound or display something. They are powerful and exciting but are not for the faint hearted, and we found them very difficult to use! The documentation is a little confusing, and Alexa offers very little help when you do something wrong. If you're lucky your directive does nothing, if you're unlucky you get the infamous "There was a problem with the requested skill's response" with no further information. 

Still with a lot of perseverance we ended up with this on the Echo Show:

 Using a display directive to show text, image and background image.

Using a display directive to show text, image and background image.

There's still plenty we could do to improve it - for example it would be interesting to dynamically chose a background image based on the forecast weather. We decided to use the display directive purely to highlight the most important information rather than simply repeating the forecast. There's an interesting tension between designing the audio response vs. what's shown on screen - the core Alexa experience still revolves around a spoken response, and it's not clear how to create a good user experience that blends both. Ultimately we decided to park the visuals for now and refocus on the core voice interaction.

Build your own

Amazon have a how-to guide and a documentation page for display directives, but we still ran into some trouble. That's most likely because we didn't take the time to read the docs properly, but I wanted to end on the steps we took to add a display directive.

Step one is to enable the display interface within the Alexa Developer Console.

display-interface-on-alexa.png

Where we got lost at this point was by slightly misreading the documentation. The syntax blocks flip between values - e.g. the 'type' must be 'BodyTemplate2' - and type hints. There are some full examples in the documentation, but we made a mistake with one of the values and couldn't work out where we went wrong. 

amazon-template2-docs.png

The break-through we eventually made was deploying the Berry Bash sample skill to AWS Lambda, triggering it using the lambda test interface and then copying the returned JSON returned into our skill. From there we could make a single change at a time to eventually return the card we wanted.

It took us a little while but we're pretty happy with the end result. We're focussing for now on the voice interactions, and hopefully the display tools will have matured a bit when we return. It would also be good to see a little more guidance on how to design a good user experience with a skill that uses both voice and a visual display.