Voice UI design for a cooking app on Amazon Alexa

Project info

In this side project, I explored the idea of using Alexa to cook by developing Chefy, a skill that helps you keep your recipes clean while cooking.

The project took 3 months to complete (UX and UI design).

The team on this project included:

Methods

Competitor Analysis

User Stories

Personas

Placeonas

Tools

Amazon Echo

ASK (Alexa Skills Kit)

Adobe XD

Voiceflow

Lucidchart

Sketch

Marvel App

My approach

I looked at some Alexa skills competitive to Chefy. I paid attention to invocation names (how the interaction gets started), intents (the ‘instructions’ accepted by the skill on which it was designed to act upon), prompts (what type of questions it provides to the user to continue the conversation), and how are the responses provided by the users handled by those skills.

Designed for audience who are familiar with using voice-controlled devices

The target audience consists of busy individuals aged 18–35 (of all genders), including those who need to prepare healthy meals or quick snacks—such as working parents and students. They are interested in:

  • Preparing healthy meals

  • Making quick meals

  • Finding inspiration for what to cook

I chose this audience based on my observations of TV cooking ads, which often target young parents (primarily women) who shop and prepare meals. As a former student myself and through my interactions with others, I realised this demographic could also benefit from kitchen support. Students, in particular, need to eat but often have limited time and tight budgets, making my Alexa Skill a perfect kitchen companion for them.

""

User persona

""

System persona

Interviewed to learn about cooking habits and the environment they cook in

When screening participants for my interviews, I focused on speaking with individuals who had both cooking experience and regular use of voice-assisted technology (such as Siri, Google Assistant, Alexa, or Google Home). The interviews were structured around three key themes:

  • Cooking habits. I explored participants' general approach to cooking, asking whether they enjoy it, how often they cook, and where they draw inspiration for their meals. This helped me understand their cooking preferences and behaviours.

  • Recipe usage. I sought to understand participants' use of recipes, including the types of sources they rely on (e.g., online, cookbooks, or magazines), and how they typically follow recipes. Additionally, I looked into whether they use voice-enabled technology to assist with cooking tasks, such as setting timers or providing hands-free guidance.

  • Kitchen environment. Since cooking often occurs at home, I investigated the environment in which participants cook. I asked about the noise level in their kitchen and whether they typically cook alone or with others. This provided insights into how the design of my Alexa Skill should adapt to different cooking settings and potential distractions.

The main finding was people’s need for personalisation of their recipes

Typically, one key ingredient was used to base their meal on

Two of my interviewees often make their decision on what to cook based on what they have in the fridge. Then, they Google for recipe suggestions based on that one ingredient they have.

They repeat making their favourite recipes, sometimes once a week

Some users said that after they find a new recipe they like, they usually keep it and cook it again in the future. This saves them time coming up with new meals throughout the week.

People liked to substitute an ingredient they don't have or don't want to use

Three out of five people I interviewed said they’d use a substitute of an ingredient they don’t have. They’d either use Google to find that substitute they have in the fridge. The remaining two said they’d either pause their cooking and go to a shop to get it or stop cooking altogether (these were people who were less experienced at cooking, i.e. in general, spend 2 – 3 times a week cooking).

Participants like to know for how many people the recipe was designed for

I also decided to let the users obtain recipes for breakfast, lunch, dinner, and snacks whilst keeping recipes very simple. Three of my participants said they find using some recipes frustrating as either they’re long (especially when using a voice assisted device) and hard to follow, or are inaccurate (one participant said they realised at the end of their recipe that their lamb should be put in the oven “for a very long time” – without no time length given).

I created a voice-user flow to visualise how users would interact with Chefy while cooking

It mapped out how all the intents in a skill are related to one another and outlined the system's responses to various inputs, providing a clear overview of the user's actions and how they relate to one another.

To learn what the user and system would actually say, you can view the corresponding script on this page.

""

Voice user flow

I conducted on- and off-site usability testing using a GitHub wizard designed for running a Wizard of Oz interface tests

On-site testing

  • Location: in the participant’s kitchen.

  • Equipment: cardboard, black material, plastic sheet, one way mirror glass film, double-sided tape, scissors, iPhone, tripod.

  • Conducting the test: I built a mockup of a wall that acted as a divider between me and the participants. Some scenario tasks were provided to the participants so they could familiarise themselves with what I needed them to do. Each time they asked Chefy for something, the wizard responded. There was a tripod with my phone set up so it could record my participants’ interaction with the system.

""

A large Amazon box had a hole cut out. The, I attached a reflective window film to a plastic sheet, and placed it over the hole. And done! :)

Off-site testing

  • Location: the software I used to conduct this usability test was appear.in and Facebook’s Messenger.

  • Conducting the test: Since Alexa users can’t see Alexa when interacting with her, my participants also could not see me during the testing. To analyse how my participants were behaving with the system I did some screen recording.

Testing and tasks

This usability testing was conducted both on-site (with 3 participants) at the participants’ homes, and off-site (with 2 participants) via appear.in (currently whereby.com) and the Facebook’s Messenger app.

""

On-site usability testing (in participants’ kitchens)

Test scenario the users read before the usability testing started:

A friend of yours recommended an Alexa Skill called ‘Chefy’. This skill is like a kitchen companion who suggests tasty meals as well as helps you make them. You’ve enabled this skill on your Alexa now and want to try it out in your kitchen.

  1. You want to learn how Chefy works. Use it to make scrambled eggs [No Intent = exploration].

  2. You want to make a nice dinner tonight but you don’t know what to cook. How would you use Chefy?

  3. You’re making a cake. Chefy says:

    S: Heat the oven to <something> and let me know once done.

    You didn’t hear what Chefy said. What do you say to Chefy? [Global = Repeat]

  4. You’re making chocolate chip cookies. Chefy says:

    S: You will need 2 cups of chocolate chips to make this recipe.

    But, it turns out, you don’t have chocolate chips. What do you say to Chefy? [GetRecipeIntent = Replacement]

  5. You want to make club sandwiches for yourself and 3 of your friends. Use Chefy to find out what you need in order to make it. [GetIngredients]

  6. You have 2 friends coming over for lunch soon. You already bought the ingredients you will need for your Quick Tuna Tacos recipe that you saved yesterday. How would you start interacting with Chefy. [GetRecipeInstructions = cooking activity]

Results

To organise my recordings, I used affinity mapping and a Rainbow Spreadsheet. I used the Severity Ratings Scale to rate the severity of the usability problems I found. Then, I grouped my findings into 4 categories:

  • Catastrophe: imperative to fix

  • Major problem: important to fix

  • Minor problem: important to fix but low priority

  • Cosmetic: if time allows, it can be fixed

Catastrophes

  • There was a problem with prompts. Some users wanted to answr YES instead of giving a response suggested by Chefy. Also, some users were unsure if “hear more” meant “hear more about this recipe” or “hear more recipe suggestions”.

  • Unclear communication with the system, i.e. some users were not sure whether they should say “Chefy” or “Alexa” when cooking with Chefy.

Major problems

  • System problem. User’s response to Chefy’s prompts were not recognised by the system.

  • User’s correction not supported. Some users corrected themselves when saying for how many people they wanted to cook (spoken 0.2 seconds apart) but Chefy was not sure how to handle it.

  • Some users did not use the “command” word for replacing an ingredient they did not have.

  • When speaking to Chefy, some users expected Chefy to know what they’re referring to (as it was the last thing they said before Chefy responded).

Minor problem

  • Some users didn’t know if Chefy would switch itself off if they stop cooking with it.

  • Some users weren’t sure if they can still cook if they were missing an ingredient.

Cosmetic problems

  • Some participants did not like Chefy’s voice (it was too quick and lacking intonation).

Proposed solution

1. Improvement to the prompts’ design

The prompts should be less vague and designed to provide a limited number of answers. The options should be clearly presented so that it is clear to the users that it is an either/or question (and not a ‘yes/no’ question).

2. Improvement to the instructions on a conversation flow

Clarify how the users of this skill should request information from Chefy whilst using it.

3. More responses should be recognised

Add more variations of responses to this Skill, e.g. for the ReplacementIntent there should be: ”What (else) can I use?”, “What do I need?”, ”I don’t have any”; “What can I use instead?”.

4. Improvement recovery from users’ errors

Chefy should repeat what she heard, and repeat it back to the user for clarification, e.g. “Sorry, how many people are we cooking for?”

5. Improvement to the conversation flow

If the system says, e.g. “Would you like to continue cooking or hear a replacement?”, Chefy should support the following response from the user: “What can I use?” Or, alternatively, provide a better guidance to the users on how they should hold a conversation with Chefy.

6. Improvement to problem solving

Make Chefy volunteer solutions to a problem, e.g. say that they can cook without that ingredient, or suggest a replacement but mention that in each case the meal could taste differently.

7. Improvement to voice interactions

Improve Chefy’s voice quality, speed and intonation. Make it easy to follow and pleasing to the ear.

Follow-up usability testing

After I iterate on my current experience, these are the changes I’d introduce to my next Chefy usability test:

  • Provide more context on each task I ask my participants to complete. That way, they’d know how they should be thinking about their interaction with the system.

  • Provide steps on recipes they are not familiar with. That way, I avoid them skipping this step as they would not know what ingredients are needed for a meal in question (as opposed to “scrambled eggs’ in my Task 1).

  • Do more piloted testing before the actual usability testing. This is to ensure the tasks are clear and the users know exactly what is expected of them.

  • Create a better Wizard of Oz experience. For instance, via Voiceflow.

An idea of the Alexa Skill in the Skills marketplace

Chefy Skill in the Amazon Skills library

Final thoughts

Designing conversational UI is not easy. As in a real conversation there is a very narrow margin of error, i.e. if we ask another person a simple question, we expect to be understood and provided with an answer. If we ask the same question of them again, we expect the same answer but spoken using different words, and sentence structure. Most importantly, we expect to be understood. Since a bot or a voice assistant are not people – but speak using a human-like language – we must accommodate for the limitations of these technologies.

Michael Beebe, a former CEO of Mayfield Robotics said:

When something speaks back to you in fluent natural language, you expect at least a child’s level of intelligence… So setting that expectation right keeps it more understandable.
— Michael Beebe