Apple Rebuilt Siri on Google Gemini — What It Actually Means
Apple is paying Google $1 billion per year to run Gemini inside Siri. The company that sold privacy as a product just outsourced the core of its assistant to iOS's biggest rival.
Apple spent a decade building an identity around privacy as a product, vertical control as a competitive advantage, and third-party dependencies as a weakness. At WWDC 2026, they walked on stage and announced that Siri — their voice assistant, the most visible Apple product for ordinary iPhone users — now runs on top of a Google model.
Not a neutral partner. Google. The company Apple fights in the browser market, the advertising market, the built-in search market on iOS, and in multiple antitrust proceedings across three continents.
And yet: the Siri AI unveiled at WWDC 2026 is powered by a custom version of Gemini, running on Nvidia Blackwell B200 GPUs inside Google Cloud.
How Siri got here
Siri's trajectory over the last four years is a textbook case of technical debt at corporate scale. While ChatGPT exploded in November 2022 and Google launched Gemini in 2023, Siri kept running a pipeline built in the pre-transformer era: manually classified intents, static knowledge graphs, near-zero reasoning capacity.
Apple announced "Apple Intelligence" at WWDC 2024 with the usual "this changes everything" framing. The actual update arrived fragmented, delayed, and the most ambitious features — Siri understanding screen context, multi-step reasoning — were repeatedly pushed to "next year." Tim Cook reportedly lost confidence in John Giannandrea, the AI lead. Bloomberg documented this in 2025.
The problem wasn't just product. It was architectural. Training a frontier model from scratch — GPT-4 or Gemini 1.5 Pro scale — costs billions of dollars and years of research. Apple has the billions; they didn't have the years.
The solution: license the finished model and build a privacy layer on top.
The three-tier architecture
What Apple presented at WWDC isn't simply "Siri uses Gemini." The architecture is more layered than that headline suggests, and understanding how it works is necessary for evaluating whether the privacy claims hold.
Tier 1 — on-device: simple tasks — setting alarms, sending messages to contacts, running shortcuts — run on Apple's own models on the device's neural engine. Nothing leaves the device.
Tier 2 — Private Cloud Compute: medium-complexity requests go to Apple's servers. Apple claims external researchers can audit the code running in this layer at any time.
Tier 3 — Google Cloud on Blackwell B200: heavy reasoning — complex questions, document analysis, long context — routes to Google Cloud, on Nvidia's Blackwell B200 GPUs. This is where the custom Gemini model runs: 1.2 trillion total parameters, mixture-of-experts architecture that activates only a relevant subset of those parameters per query.
Before reaching Google, Apple says it anonymizes and tokenizes queries so neither Apple nor Google can link a request to a specific user. The contract with Google reportedly prohibits using the data to train future models. Blackwell B200 GPUs include confidential computing at the hardware level, encrypting data while it's being processed on the chip.
Why Blackwell B200 specifically
Nvidia launched the Blackwell architecture in early 2025, and the B200 is the highest-performing GPU generation commercially available for LLM inference workloads. Compared to Hopper (H100), B200s deliver significantly higher inference throughput with lower energy consumption per token.
For Apple, this matters for two reasons. First: economic viability. A 1.2-trillion-parameter mixture-of-experts model can run at the scale of billions of users in an economically sustainable way only if the inference hardware is efficient enough. Second: the B200's native confidential computing — the GPU processes encrypted data without exposing it in memory — is precisely the technical argument Apple uses to claim "not even Google sees your data."
The deal is reported at approximately $1 billion per year.
The problem no contract fully resolves
Craig Federighi took the stage and said "privacy in AI is non-negotiable." The line landed well on the slide. The problem is that contractual promises and hardware guarantees have different scopes than the privacy guarantee Apple was selling before.
Security researchers flagged the obvious point right after the keynote: Private Cloud Compute is "only as private as its weakest link." If Google retains any path to usage data — even anonymized, even aggregated — for model debugging or infrastructure monitoring, the privacy promise changes in nature. It isn't broken, but it's different from "your data never leaves the device."
This isn't an argument against Siri AI. It's an argument for honesty: Apple changed what "privacy" means for its assistant, and the keynote didn't make that change explicit.
Before WWDC 2026, Siri processed requests on-device or in Apple's own Private Cloud Compute. Data never touched third-party infrastructure. Now it does — with robust contractual and technical guarantees, but it does.
What Siri AI actually gains
Beyond the privacy debate, the functional changes are substantial and worth documenting.
Siri AI gains real-time screen awareness. If you receive a text with flight details, you can ask Siri to "add this to my calendar and text the arrival time to my mom" — it reads the screen, creates the event, sends the message. This worked in a rudimentary way with Apple Intelligence 2024; now it works reliably.
Siri also gets a standalone app for the first time. You can open it, type or talk, keep a conversation history — similar to using ChatGPT or Gemini as a product. Not the iPhone's voice assistant; a separate app with a chat interface.
Multi-step reasoning works now. "Book a table for Friday evening near where John lives, but not Japanese because he has an allergy" — that kind of instruction with implicit context and multiple constraints was exactly where the old Siri consistently failed. The underlying Gemini model handles it.
What this means for the industry
Apple admitting it can't build a competitive frontier model on its own is market information. The world's most valuable company, with over $100 billion in cash, access to hardware, usage data from billions of devices — and it concluded that licensing model capacity from Google is more efficient than training from scratch.
That shifts the "build vs buy" conversation for language models across the industry. If Apple goes buy, any company that was still debating whether to train a proprietary model has a heavy precedent to weigh.
It also puts Google in an interesting position: it collects $1 billion per year from Apple to power the main competitor to Google Assistant. Strange market.
Microsoft has GPT-4 via OpenAI in Copilot. Amazon has its own models via Bedrock, with Anthropic as a strategic investment. Apple now has Gemini. The open question: which of the three strategies produces the best voice assistant in two years? We don't know yet.
The pivot Apple didn't explicitly name
What was implicit in the keynote and worth naming directly: Apple abandoned the Apple Intelligence strategy as a proprietary model differentiator. The "Apple Intelligence" from WWDC 2024 was, in Apple's narrative, a competitive advantage — models trained by Apple, for Apple hardware, with Apple privacy. The rebranding to "Siri AI" with Gemini underneath is the acknowledgment that that strategy didn't work within the required timeframe.
Apple remains differentiated at the hardware layer (A-series chips, neural engine), the on-device layer (smaller models for simple tasks), and the product layer (deep integration with the iOS/macOS ecosystem). The language model became a licensed commodity.
That's pragmatism. It's not a defeat — the integration works, the product is better. But it's a strategic repositioning that Apple didn't name explicitly, and one that has implications for how you should think about the next iPhone cycles.
Note: the editorial content ends here. What follows is a mention of a related tool.
Related tool
If you're exploring how AI assistants and servers identify devices and browsers in HTTP requests, the User Agent Parser analyzes any user agent string instantly — detecting browser, operating system, device type, and rendering engine. Useful when debugging how your app appears to different HTTP clients or when validating request headers.