Two challenging software design briefs to consider:

  1. We have a new system for controlling the features, functions, and apps on your smartphone. However: it is more or less invisible. It is also novel in what it can (and can’t) do, evolving to include ever-more capabilities each year, and somewhat unbounded in what it could eventually do. This system will be distributed to hundreds of millions of novice smartphone users. Please design the system such that users can understand what they can and can’t do as well as what’s happening as they interact with the system. Note: the system has very imprecise inputs and many inputs will fail.
  2. We have a new system that can perform many forms of calculation, charting, graphing, computation, comparison, measurement, and more. As before: it is novel in what it can (and can’t) do, evolving to include ever-more capabilities, and somewhat unbounded in what it could eventually do. This system will have much lesser distribution and will mostly be found by sophisticated users; however: it must have a conventional and familiar graphical UI. Please design the system such that users can understand what they can and can’t do.

In one case: we are burdened by not having easy visual referents, cues, trails, and affordances. In the other, we are constrained by the need for a familiar UI despite the unfamiliar range of possible actions. Both challenges represent the sorts that designers will face as interfaces became less about screens and more about systems. The first is of course Siri; the second is Wolfram Alpha.

Siri and the invisible interface
Steve Jobs famously told Walter Isaacson that he’d figured out the ideal interface for the mythical Apple television product:

“It will have the simplest user interface you could imagine. I finally cracked it.”

An outstanding example of how Jobs thought, this statement involves two facts: first, that nearly all of us know how to converse and that therefore highly-successful voice-interaction systems are “simple” for us; and second, that Jobs conceived of difficulty in a strange way. Thinking of voice interaction as the UI isn’t difficult, after all; executing on voice interaction systems at scale is. And Apple’s stumbles with Siri show just how difficult execution in this area is. Conceiving it wasn’t cracking it.

Nevertheless, Siri is instructive if we want to consider really radical UI challenges. The inheritance of software design on the web comes largely from print design, as David Cole has noted, and in recent years much has been made of the necessity of finally breaking from that tradition (Cole’s Designers Will Code describes one path away from quasi-print design towards real systems and interaction design). Both Web 2.0 and mobile necessitated this break, which continues to accelerate, by introducing technological forms which were metaphorically distant from the notion of a “page.”

But while AJAX interactions and small, content-first / UI-minimizing screens problematized print design principles, voice interaction systems make them nearly useless. How does one design learnable patterns, generalizable flows, reusable and consistent elements in sound?

For illustration, let’s focus on a single issue: how does one educate users to what they can do, let them discover features, and so on? In software with visual UI, we have a superabundance of means at our disposal: tool tips, skeuomorphic metaphors, the now longstanding GUI traditions, etc.

But with Siri, the best Apple could come up with was this:

Fig. 1: Apple hopes you won’t need help, hiding the small ? button in the lower left. Should you need help, prepare to memorize!

In what seems like a partial white flag, Apple essentially says

  1. we can do no better than a user manual; make users read what they can do first;
  2. because no one has user manuals anymore, forgo a systematic or pedagogically sensible approach to learning and just list all the examples (!!!)

It should go without saying that this is a suboptimal solution. Siri tells you nothing about what she can do until you ask, then tells you via non-standard visual UI (despite herself being invisible) every single thing she can do, grouped by both apps and arbitrary categories in a multi-screen list. As her capabilities grow, the list will too; if she were opened to 3rd parties through an API, Siri’s “user manual” would quickly become overwhelming.

It seems to me that this solution must be in their minds a stop-gap sort; the design of Siri suggests that her eventual developmental goal is to have sufficient AI-esque flexibility to make instruction unnecessary; you can simply ask Siri to do something (or if she can do something) and she’ll reply appropriately.

In the meantime, Siri remains their most quixotic feature since the first generation Newton’s handwriting recognition. As a systems / technologies integrator, Apple is usually quite smart about anticipating when a given input device has reached maturity and can be marketed. But even Apple makes mistakes; Siri isn’t yet “the simplest user interface you can imagine” because at the moment, you can’t imagine her and Apple can’t help you to. Her scope and capabilities are unknown to you, and designers have yet to solve the problem of conveying them artfully, quickly, intelligibly; she is not the artificially intelligent assistant she’d need to be to not require any documentation or affordances, and we simply haven’t yet figured out how to design appropriate affordances for her.

So we’ve designed an inferior user manual of sorts and left it at that; I imagine utilization of Siri’s lesser-known features is rare.

WolframAlpha: search me!
One solution to the problem of designing a novel system is to use a familiar UI pattern or element; a button is a button is a button, whether pressing it turns on a car or makes Mario jump. And often to use such an element makes sense if the systems which make use of it are analogous or isomorphic to one another.

Thus when WolframAlpha was launched, it made some sense to structure its interface around that most-familiar and comfortable of UI elements, the search box. Users know search boxes and (think they) know what they do.

Unfortunately, the analogous UI of the search box inclines us to use Wolfram Alpha to search. But it can also compute, compare, chart, and perform other tasks that Google or Bing cannot. In deciding to use a familiar pattern —which makes the product look intelligible, solving one problem— Wolfram introduces another problem: now users think they can only search. A familiar pattern can mislead users or make them think they understand something they don’t. It might help conversion through the flow, but when users don’t understand the dimensions of the product, it’s a problem.

And once again, we arrive at what seems like an old-fashioned place: the integrated user manual, which WolframAlpha relies upon to stave off simple searching:

Fig. 2: Lots to memorize! These screenshots were shared by Adam Chavez in his answer to “When would you think, ‘WolframAlpha app would be good for that’ or what was the last thing you used it for?”

Wolfram Alpha and Siri are in the comparable and unenviable position of attempting to introduce new interaction functionality to users while disguising the system (and its syntax, concepts, processes, and structure) that provides that functionality. They want to keep cognitive load low even as they need us to learn new forms, new modes, new metaphors.

Given their users and probable aspirations, a guide is probably not the worst solution for WolframAlpha; and I have no better solution for Apple’s dilemma with Siri (although I think a more supportive visual environment for Siri would be useful). And examples are useful in guides, just usually not as guides, especially for open-ended systems like these.

It’s hard not to feel that we’re stuck in a bit of a lurch: we haven’t designed or standardized models of interaction in non-visual or services-oriented software systems. One might say that our design model —from print down to local software applications— is “showing through” in the way that bad software’s data model often does. The aversion to advertising the need for a systematic introduction to these products means that we rely on a rather weak solution to the problem of user comprehension. We’re advanced enough to know that manuals are crummy (and have reached a point in designing software with visual UIs that we rarely need them), but we are just starting to tackle the same problem in newer interface methods, and in the interim we have these strange hybrids: voice UIs with visual components and list-style manuals; search engines that aren’t and that mask significant complexity to both their benefit and their detriment; and so on.

A final question: if print design informed the design of visual software UIs, what will inform the design of voice systems? Of data systems?

Mills Baker