Jeff Dean says Google’s AI Search still works like classic Search: narrow the web to relevant pages, rank them, then let a model generate the answer.
In an interview on Latent Space: The AI Engineer Podcast, Google’s chief AI scientist explained how Google’s AI systems work and how much they rely on traditional search infrastructure.
The architecture: filter first, reason last. Visibility still depends on clearing ranking thresholds. Content must enter the broad candidate pool, then survive deeper reranking before it can be used in an AI-generated response. Put simply, AI doesn’t replace ranking. It sits on top of it.
Dean said an LLM-powered system doesn’t read the entire web at once. It starts with Google’s full index, then uses lightweight methods to identify a large candidate pool — tens of thousands of documents. Dean said:
“You identify a subset of them that are relevant with very lightweight kinds of methods. You’re down to like 30,000 documents or something. And then you gradually refine that to apply more and more sophisticated algorithms and more and more sophisticated sort of signals of various kinds in order to get down to ultimately what you show, which is the final 10 results or 10 results plus other kinds of information.”
Stronger ranking systems narrow that set further. Only after multiple filtering rounds does the most capable model analyze a much smaller group of documents and generate an answer. Dean said:
“And I think an LLM-based system is not going to be that dissimilar, right? You’re going to attend to trillions of tokens, but you’re going to want to identify what are the 30,000-ish documents that are with the maybe 30 million interesting tokens. And then how do you go from that into what are the 117 documents I really should be paying attention to in order to carry out the tasks that the user has asked me to do?”
Dean called this the “illusion” of attending to trillions of tokens. In practice, it’s a staged pipeline: retrieve, rerank, synthesize. Dean said:
“Google search gives you … not the illusion, but you are searching the internet, but you’re finding a very small subset of things that are relevant.”
Matching: from keywords to meaning. Nothing new here, but we heard another reminder that covering a topic clearly and comprehensively matters more than repeating exact-match phrases.
Dean explained how LLM-based representations changed how Google matches queries to content.
Older systems relied more on exact word overlap. With LLM representations, Google can move beyond the idea that particular words must appear on the page and instead evaluate whether a page — or even a paragraph — is topically relevant to a query. Dean said:
“Going to an LLM-based representation of text and words and so on enables you to get out of the explicit hard notion of particular words having to be on the page. But really getting at the notion of this topic of this page or this page paragraph is highly relevant to this query.”
That shift lets Search connect queries to answers even when wording differs. Relevance increasingly centers on intent and subject matter, not just keyword presence.
Query expansion didn’t start with AI. Dean pointed to 2001, when Google moved its index into memory across enough machines to make query expansion cheap and fast. Dean said:
“One of the things that really happened in 2001 was we were sort of working to scale the system in multiple dimensions. So one is we wanted to make our index bigger, so we could retrieve from a larger index, which always helps your quality in general. Because if you don’t have the page in your index, you’re going to not do well.
“And then we also needed to scale our capacity because we were, our traffic was growing quite extensively. So we had a sharded system where you have more and more shards as the index grows, you have like 30 shards. Then if you want to double the index size, you make 60 shards so that you can bound the latency by which you respond for any particular user query. And then as traffic grows, you add more and more replicas of each of those.
And so we eventually did the math that realized that in a data center where we had say 60 shards and 20 copies of each shard, we now had 1,200 machines with disks. And we did the math and we’re like, Hey, one copy of that index would actually fit in memory across 1,200 machines. So in 2001, we … put our entire index in memory and what that enabled from a quality perspective was amazing.
Before that, adding terms was expensive because it required disk access. Once the index lived in memory, Google could expand a short query into dozens of related terms — adding synonyms and variations to better capture meaning. Dean said:
“Before, you had to be really careful about how many different terms you looked at for a query, because every one of them would involve a disk seek.
“Once you have the whole index in memory, it’s totally fine to have 50 terms you throw into the query from the user’s original three- or four-word query. Because now you can add synonyms like restaurant and restaurants and cafe and bistro and all these things.
“And you can suddenly start … getting at the meaning of the word as opposed to the exact semantic form the user typed in. And that was … 2001, very much pre-LLM, but really it was about softening the strict definition of what the user typed in order to get at the meaning.”
That change pushed Search toward intent and semantic matching years before LLMs. AI Mode (and its other AI experiences) continues Google’s ongoing shift toward meaning-based retrieval, enabled by better systems and more compute.
Freshness as a core advantage. Dean said one of Search’s biggest transformations was update speed. Early systems refreshed pages as rarely as once a month. Over time, Google built infrastructure that can update pages in under a minute. Dean said:
“In the early days of Google, we were growing the index quite extensively. We were growing the update rate of the index. So the update rate actually is the parameter that changed the most.”
That improved results for news queries and affected the main search experience. Users expect current information, and the system is designed to deliver it. Dean said:
“If you’ve got last month’s news index, it’s not actually that useful.”
Google uses systems to decide how often to crawl a page, balancing how likely it is to change with how valuable the latest version is. Even pages that change infrequently may be crawled often if they’re important enough. Dean said:
“There’s a whole … system behind the scenes that’s trying to decide update rates and importance of the pages. So, even if the update rate seems low, you might still want to recrawl important pages quite often because the likelihood they change might be low, but the value of having updated is high.”
Why we care. AI answers don’t bypass ranking, crawl prioritization, or relevance signals. They depend on them. Eligibility, quality, and freshness still determine which pages are retrieved and narrowed. LLMs change how content is synthesized and presented — but the competition to enter the underlying candidate set remains a search problem.
The interview. Owning the AI Pareto Frontier — Jeff Dean