Earlier today, OpenAI announced its newest product: GPT-4o, a faster, cheaper, more powerful version of its most advanced large language model, and one that the company has deliberately positioned as the next step in “natural human-computer interaction.” Running on an iPhone in what was purportedly a live demo, the program appeared able to tell a bedtime story with dramatic intonation, understand what it was “seeing” through the device’s camera, and interpret a conversation between Italian and English speakers. The model—which was powering an updated version of the ChatGPT app—even exhibited something like emotion: Shown the sentence I ♥️ ChatGPT handwritten on a page, it responded, “That’s so sweet of you!”
Although such features are not exactly new to generative AI, seeing them bundled into a single app on an iPhone was striking. Watching the presentation, I felt that I was witnessing the murder of Siri, along with that entire generation of smartphone voice assistants, at the hands of a company most people had not heard of just two years ago.
Apple markets its maligned iPhone voice assistant as a way to “do it all even when your hands are full.” But Siri functions, at its best, like a directory for the rest of your phone: It doesn’t respond to questions so much as offer to search the web for answers; it doesn’t translate so much as offer to open the Translate app. And much of the time, Siri can’t even pick up what you’re saying properly, let alone watch someone solve a math problem through the phone camera and provide real-time assistance, as ChatGPT did earlier today.
Just as chatbots have promised to condense the internet into a single program, generative AI now promises to condense all of a smartphone’s functions into a single app, and to add a whole host of new ones: Text friends, draft emails, learn what the name of that beautiful flower is, call an Uber and talk to the driver in their native language, without touching a screen. Whether that future comes to pass is far from certain. Demos happen in controlled environments and are not immediately verifiable. OpenAI’s was certainly not without its stumbles, including choppy audio and small miscues. We don’t know yet to what extent familiar generative-AI problems, such as the confident presentation of false information and difficulty in understanding accented speech, may emerge once the app is rolled out to the public over the coming weeks. But at the very least, to call Siri or Google Assistant “assistants” is, by comparison, insulting.
The major smartphone makers seem to recognize this. Apple, notoriously late to the AI rush, is reportedly deep in talks with OpenAI to incorporate ChatGPT features into an upcoming iPhone software update. The company has also reportedly held talks with Google to consider licensing Gemini, the search giant’s flagship AI product, to the iPhone. Samsung has already brought Gemini to its newest devices, and Google tailored its latest smartphone, the Pixel 8 Pro, specifically to run Gemini. Chinese smartphone makers, meanwhile, are racing their American counterparts to put generative AI on their devices.
Today’s demo was a likely death blow not only to Siri but also to a wave of AI start-ups promising a less phone-centric vision of the future. A company named Humane produces an AI pin that is worn on a user’s clothing and responds to spoken questions; it has been pummeled by reviewers for offering an inconsistent and glitchy experience. Rabbit’s R1 is a small handheld box that my colleague Caroline Mimbs Nyce likened to a broken toy.
These gadgets, and others that may be on the horizon, face inevitable hurdles: compressing a decent camera, a good microphone, and a powerful microprocessor into a tiny box, making sure that box is light and stylish, and persuading people to carry yet another device on their body. Apple and Android devices, by comparison, are efficient and beautiful pieces of hardware already ubiquitous in contemporary life. I can’t think of anybody who, forced to choose between their iPhone and a new AI pin, wouldn’t jettison the pin—especially when smartphones are already perfectly positioned to run generative-AI programs.
Each year, Apple, Samsung, Google, and others roll out a handful of new phones offering better cameras and more powerful computer chips in thinner bodies. This cycle isn’t ending anytime soon—even if it’s gotten boring—but now the most exciting upgrades clearly aren’t happening in physical space. What really matters is software.
The iPhone was revolutionary not just because it combined a screen, a microphone, and a camera. Allowing people to take photos, listen to music, browse the web, text family members, play games—and now edit videos, write essays, make digital art, translate signs in foreign languages, and more—was the result of a software package that puts its screen, microphone, and camera to the best use. And the American tech industry is in the midst of a centi-billion-dollar bet that generative AI will soon be the only software worth having.