Why Apple Failed to Integrate OpenAI's ChatGPT into Siri As Promised

Last year — June 10, 2024 — Apple and OpenAI announced at Apple’s WWDC keynote, with great fanfare, they were adding ChatGPT intelligence to Siri.

Days earlier, at Manifest in Berkeley, FutureSearch publicly predicted Siri would not switch to using a large OpenAI model anytime soon.

Fast forward to March 2025. The same Bloomberg journalist who leaked the Apple/OpenAI partnership ahead of WWDC reported that, indeed, a “fully conversational Siri” won’t be available until at least 2027, three years after the initial announcement.

How did we predict this 9 months ago, and what is preventing GPT-4-level intelligence on iPhones?

The View from June 2024

We used FutureSearch’s forecasting tools and arrived at a 60% likelihood that an Apple/OpenAI Siri partnership would be announced at WWDC. This was based on a 42% forecast from an “inside view” (how would this work?) and a 77% chance from an “outside view” (how often are such rumors true?).

Turns out we were directionally correct on both accounts: the rumors were true, but the plan didn’t work!

Outside View Forecast - were the rumors likely to be true?

The Outside View looks at historical trends and base rates from similar events in history.

We used our in-house FutureSearch (FS) Forecaster to ideate and research base rates pertaining to how often such events tend to happen:

How often does Apple depend on external partners for their tech? (FS Estimate: 47%)
How often are rumors regarding tech company partnerships true? (FS Estimate: 51%)
How often does Apple announce new partnerships at WWDC? (FS Estimate: 18%)
How often do credible news outlets accurately break rumors regarding Apple? (FS Estimate: 79%)
How often are rumors true when initially reported by Bloomberg? (FS Estimate: 80%)

We ultimately based our outside view forecast on two very narrow questions:

How often are rumors from Bloomberg Chief Correspondent Mark Gurman about Apple true? (FS Estimate: 77%)
How long does it typically take between rumors of an Apple releases and the announcement? (FS Estimate: 6.3 months)

These last two are below, as taken from our presentation in June 2024:

We estimated 6.3 months for the time between a rumored product and an official Apple announcement — so a WWDC announcement just weeks after the rumors broke would be unusually fast.

We estimated that rumors about Apple from Bloomberg Chief Correspondent Mark Gurman’s are true 77% of the time:

The Inside View: how would ChatGPT in Siri actually work?

The Inside View comes from case-specific details, models, and chains of reasoning.

Our human questions into this were primarily about quality and scale.

We boiled our research down to two questions:

How good will Apple’s on-device LLM be? (Do they even need ChatGPT in the cloud?)
What % of Siri calls could route to ChatGPT without overloading OpenAI’s backend?

Inside view #1: How good will Apple’s on-device LLM be? (Do they even need ChatGPT in the cloud?)

We asked this specific question because Apple has been historically behind in AI-powered assistants, and its integration of LLMs with "Apple Intelligence" could determine whether Siri finally catches up to OpenAI and Google. With Apple reportedly working on an on-device LLM, it was plausible it would be good enough to handle most AI tasks locally.

The FutureSearch numbers answer this by estimating the model size and performance constraints of an on-device Apple LLM. What FS finds (see below) is that the real limitation is memory, not compute: Apple's chips could handle a 40B-parameter model within an iPhone 15’s memory constraints.

This suggests Apple Intelligence doesn’t necessarily need ChatGPT for everyday use. Rather, Apple’s AI strategy may be more self-sufficient than expected, raising questions about how much ChatGPT will actually be used in practice.

Perplexity Pro, in June 2024, said an iPhone could only fit a 500M param model. FutureSearch said 40B!

Inside view #2: What percent of Siri calls could OpenAI’s backend handle?

Let’s say the on-device models won’t be good enough, and Siri does need to call ChatGPT remotely. Could OpenAI’s backends handle the traffic?

We estimated total Siri queries per day (~830 million). We then calculated ChatGPT’s daily queries (~210 million). (We derived this number in part by referencing OpenAI revenue numbers we had previously worked out and published.)

This comparison suggests that Siri handled about 4x the volume of queries as ChatGPT did in mid-2024. If OpenAI were to power all of Siri's requests, it would require a massive increase in infrastructure—likely beyond its current capacity.

Perplexity Pro, in June 2024, estimated Siri traffic at 285M daily queries, and ChatGPT at 4-7B daily queries.

FutureSearch finds Siri traffic at 820M daily queries, and ChatGPT at 210M daily queries. Quite the difference!

So a pure "Siri just becomes ChatGPT" scenario seemed unlikely due to scale alone.

One final consideration was Apple’s privacy stance. Full reliance on OpenAI’s cloud via a “secure enclave” would be technically complicated and slow to build out at this scale. We thought Apple would much prefer to lean on their existing investments in on-device AI if they at all could.

Instead, we also considered a hybrid approach where OpenAI enhanced Siri only for complex queries, while simpler ones stay on-device. There are three integration options we see as most plausible:

Apple puts a “Siri by OpenAI” label on a largely unchanged system
GPT-4 powered Siri requires an opt-in or subscription, while by default all Siri requests are on-device
OpenAI gives Apple access to the ChatGPT weights, and Apple runs ChatGPT themselves in their own datacenters

Reviewing the final FutureSearch forecast, 9 months later

Last June, 42% felt low, especially with respect to an OpenAI/Siri partnership announcement—not a product release—at WWDC. But nine months later, our assessment of any actual OpenAI/Siri integration holds up well!

Bloomberg’s Chief Correspondent is still following all things Apple, but with a new tone — warning of an Apple AI Crisis, and referring to Apple’s ChatGPT integration as “an afterthought [that] lacks conversational abilities.”

While Gurman has reported on issues ranging from models hitting limits to hardware supply-chain issues to ineffective leadership, a continuing theme is that Apple is committed to using on-device ML as it continues to prioritize its fundamental approach to privacy. Apple appears to be following a different AI strategy than most in the race—one that values privacy and efficiency over raw intelligence at any cost.

This was already apparent last year: not just because of Apple’s espoused values, but because of the fundamental constraints of small and large models.

This method, combining the directional “outside view” from historical precedent, and the rigorous “inside view” of LLM capabilities and throughput, can lead to some very accurate forecasts.

And if AI systems like FutureSearch do the heaviest lift research, the analysis becomes tractable without relying on teams of elite forecasters spending weeks on the problem!

Apple’s plan to power Siri with ChatGPT was a predictable failure

Company

Follow Us