Imagine the perfect app for encoding notable ideas. That's what I did.

The ability to record and catalog thoughts is a popular market  ( to the tune of 100+ million users according to Evernote )  but the experience is still optimized for keyboard input on mobile, where voice dictation is most useful. The following is a design project in which I created a more streamlined dictation experience from idea to prototype.



A quick look at the current state of note dictation on Evernote’s iOS app yields an arduous number of taps and interactions.

I love using Evernote and find the dictation use-case incredibly valuable, but an 8-step process is ripe for improvement.

What are the minimum steps required to dictate voice notes, and how can we compound multiple actions into single steps?

Focusing on voice dictation allows us to consolidate this use case into two primary steps:

  1. Press and Dictate Title
  2. Press and Dictate Body

*Note: it’s possible to collapse this UX even further, but more on that later.


To accomplish this streamlined experience, we’ll frame the product with the following requirements:

  • Home screen is new note canvas
  • Title and Body fields auto-select (title > body)
  • Primary Interaction: Long-press and dictate voice-to-text
  • Secondary Interactions: List, Share, Keyboard and New*
  • Notes auto-saved upon completion (can be deleted in List view)

*Secondary Interactions:

  • List (view existing notes to review/edit/share)
  • Share (send current note to friends or self)
  • Keyboard (allow QWERTY editing)
  • New (create a new note)

User Experience Map

The right diagram depends on product complexity and desired emphasis (touchpoints, interaction elements, emotional states).

For products with limited screens and interactions, I like this User Sees : User Does strategy I picked up from the Basecamp team.

Here it is applied to our product:

Low-Fidelity Design

Let’s start by visualizing our Primary UI (title field, body field, and record button) in a forgiving environment.


*We’ll want to indicate which field is selected, but I’ll worry about how we do that later.

What about Secondary Interactions? 🤔

We could tuck these elements behind a menu, but that’s sub-optimal.

We could use iOS’s standard Tab Navigation for subtask switching, but this sacrifices some viewable space for our note.

So instead, I’d like to explore embedding these elements into a full-screen canvas. Fortunately, we have 4 elements and 4 corners of our screen…coincidence?


Keep in mind we’re still in low-fidelity so symbols and icons are directional.

We find that bringing these elements into view only when necessary dedicates maximum real estate to the primary use case of recording a note.

I quickly wired together a prototype, concerned only by the interaction logic (rather than layout, colors, pixels, etc.) to ensure our strategy was sound.

Here we can test our assumptions about the intuitive nature of using long-press to dictate and revealing secondary interactions only where appropriate. Once connected in hi-fidelity, these will look and feel much better.

Hi-Fidelity Design

For our canvassy, layered interface, let’s draw from Material Design principles that embrace z-axis depth and fun, snappy interactions.

Preparing our artboards in Sketch, I’ll mock up every state from beginning-to-end, ensuring we’ll have a continuous flow for prototyping.

I like to drop my color palate right on my Sketch file for easy color picking.

Indicating the Selected Field

Remember that pesky little problem of showing users what they’re editing?

I decided to use a glimmer effect to indicate the selected field for Title and Body. I first saw this animation used in Facebook’s Paper app for iOS.


Prototyping in Flinto

Flinto allows us to connect elements and screens with intricate animations. I can even include animated gifs to achieve that glimmer effect!

Interactions with Material Elements

Interactions is where you really get the chance to breath life and personality into your user experience. Here we have a simple list view interaction that is much more lively than a slide out drawer.

Our button springs and spins to its alternate state, and by animating each note in the list individually, we give users a sense of separation and expectation that each item is its very own element.


In fact, we see that when selected, the material of the list item pops to fill the full canvas and secondary interaction elements are revealed.

This sense of layering must be designed carefully, with consistent shadowing and z-axis depth for each class of element.


Bringing everything together, we can demo our application across use cases to continue testing and iterating, as shown in the following video.


And that’s it! A simple note taking application, streamlined for voice dictation.

*Oh, one last thing. At the beginning, I mentioned streamlining this experience even further than our long-press paradigm. Imagine a similar application (speak to dictate title > speak to dictate body) that indicates fields with audio queues but requires no visual interface, moving you through the creation flow based on pauses in your cadence. The best interfaces are invisible, after all.