The tiny camera eyes that now stare back at us from any screen can be trained with additional skills. First the eyes were trained to detect a generic face, used in digital cameras to assist focusing. Then they were taught to detect particular faces—say, yours—as identity passwords. Your laptop looks into your face, and deeper into your irises, to be sure it is you before it opens its home page. Recently researchers at MIT have taught the eyes in our machines to detect human emotions. As we watch the screen, the screen is watching us, where we look, and how we react. Rosalind Picard and Rana el Kaliouby at the MIT Media Lab have developed software so attuned to subtle human emotions that they claim it can detect if someone is depressed. It can discern about two dozen different emotions. I had a chance to try a beta version of this “affective technology,” as Picard calls it, on Picard’s own laptop. The tiny eye in the lid of her laptop peering at me could correctly determine if I was perplexed or engaged with a difficult text. It could tell if I was distracted while viewing a long video. Since this perception is in real time, the smart software can adapt it to what I’m viewing. Say I am reading a book and my frown shows I’ve stumbled on a certain word; the text could expand a definition. Or if it realizes I am rereading the same passage, it could supply an annotation for that passage. Similarly, if it knows I am bored by a scene in a video, it could jump ahead or speed up the action.
We are equipping our devices with senses—eyes, ears, motion—so that we can interact with them. They will not only know we are there, they will know who is there and whether that person is in a good mood. Of course, marketers would love to get hold of our quantified emotions, but this knowledge will serve us directly as well, enabling our devices to respond to us “with sensitivity” as we hope a good friend might.
In the 1990s I had a conversation with the composer Brian Eno about the rapid changes in music technology, particularly its sprint from analog to digital. Eno made his reputation by inventing what we might now call electronic music, so it was a surprise to hear him dismiss a lot of digital instruments. His primary disappointment was with the instruments’ atrophied interfaces—little knobs, sliders, or tiny buttons mounted on square black boxes. He had to interact with them by moving only his fingers. By comparison, the sensual strings, table-size keyboards, or meaty drumheads of traditional analog instruments offered more nuanced bodily interactions with the music. Eno told me, “The trouble with computers is that there is not enough Africa in them.” By that he meant that interacting with computers using only buttons was like dancing with only your fingertips, instead of your full body, as you would in Africa.
Embedded microphones, cameras, and accelerometers inject some Africa into devices. They provide embodiment in order to hear us, see us, feel us. Swoosh your hand to scroll. Wave your arms with a Wii. Shake or tilt a tablet. Let us embrace our feet, arms, torso, head, as well as our fingertips. Is there a way to use our whole bodies to overthrow the tyranny of the keyboard?
One answer first premiered in the 2002 movie Minority Report. The director, Steven Spielberg, was eager to convey a plausible scenario for the year 2050, and so he convened a group of technologists and futurists to brainstorm the features of everyday life in 50 years. I was part of that invited group, and our job was to describe a future bedroom, or what music would sound like, and especially how you would work on a computer in 2050. There was general consensus that we’d use our whole bodies and all our senses to communicate with our machines. We’d add Africa by standing instead of sitting. We think different on our feet. Maybe we’d add some Italy by talking to machines with our hands. One of our group, John Underkoffler, from the MIT Media Lab, was way ahead in this scenario and was developing a working prototype using hand motions to control data visualizations. Underkoffler’s system was woven into the film. The Tom Cruise character stands, raises his hands outfitted with a VR-like glove, and shuffles blocks of police surveillance data, as if conducting music. He mutters voice instructions as he dances with the data. Six years later, the Iron Man movies picked up this theme. Tony Stark, the protagonist, also uses his arms to wield virtual 3-D displays of data projected by computers, catching them like a beach ball, rotating bundles of information as if they were objects.
It’s very cinematic, but real interfaces in the future are far more likely to use hands closer to the body. Holding your arms out in front of you for more than a minute is an aerobic exercise. For extended use, interaction will more closely resemble sign language. A future office worker is not going to be pecking at a keyboard—not even a fancy glowing holographic keyboard—but will be talking to a device with a newly evolved set of hand gestures, similar to the ones we now have of pinching our fingers in to reduce size, pinching them out to enlarge, or holding up two L-shaped pointing hands to frame and select something. Phones are very close to perfecting speech recognition today (including being able to translate in real time), so voice will be a huge part of interacting with devices. If you’d like to have a vivid picture of someone interacting with a portable device in the year 2050, imagine them using their eyes to visually “select” from a set of rapidly flickering options on the screen, confirming with lazy audible grunts, and speedily fluttering their hands in their laps or at their waist. A person mumbling to herself while her hands dance in front of her will be the signal in the future that she is working on her computer.
Not only computers. All devices need to interact. If a thing does not interact, it will be considered broken. Over the past few years I’ve been collecting stories of what it is like to grow up in the digital age. As an example, one of my friends had a young daughter under five years old. Like many other families these days, they didn’t have a TV, just computing screens. On a visit to another family who happened to have a TV, his daughter gravitated to the large screen. She went up to the TV, hunted around below it, and then looked behind it. “Where’s the mouse?” she asked. There had to be a way to interact with it. Another acquaintance’s son had access to a computer starting at the age of two. Once, when she and her son were shopping in a grocery store, she paused to decipher the label on a product. “Just click on it,” her son suggested. Of course cereal boxes should be interactive! Another young friend worked at a theme park. Once, a little girl took her picture, and after she did, she told the park worker, “But it’s not a real camera—it doesn’t have the picture on the back.” Another friend had a barely speaking toddler take over his iPad. She could paint and easily handle complicated tasks on apps almost before she could walk. One day her dad printed out a high-resolution image on photo paper and left it on the coffee table. He noticed his toddler came up and tried to unpinch the photo to make it larger. She tried unpinching it a few times, without success, and looked at him, perplexed. “Daddy, broken.” Yes, if something is not interactive, it is broken.
The dumbest objects we can imagine today can be vastly improved by outfitting them with sensors and making them interactive. We had an old standard thermostat running the furnace in our home. During a remodel we upgraded to a Nest smart thermostat, designed by a team of ex-Apple execs and recently bought by Google. The Nest is aware of our presence. It senses when we are home, awake or asleep, or on vacation. Its brain, connected to the cloud, anticipates our routines, and over time builds up a pattern of our lives so it can warm up the house (or cool it down) just a few minutes before we arrive home from work, turn it down after we leave, except on vacations or on weekends, when it adapts to our schedule. If it senses we are unexpectedly home, it adjusts itself. All this watching of us and interaction optimizes our fuel bill.