The five infrastructure gates behind crawl, render, and index

The DSCRI-ARGDW pipeline maps 10 gates between your content and an AI recommendation across two phases: infrastructure and competitive. Because confidence multiplies across the pipeline, the weakest gate is always your biggest opportunity. Here, we focus on the first five gates.
The infrastructure phase (discovery through indexing) is a sequence of absolute tests: the system either has your content, or it doesn’t. Then, as you pass through the gates, there’s degradation.
For example, a page that can’t be rendered doesn’t get “partially indexed,” but it may get indexed with degraded information, and every competitive gate downstream operates on whatever survived the infrastructure phase.

If the raw material is degraded, the competition in the ARGDW phase starts with a handicap that no amount of content quality can overcome.
The industry compressed these five distinct DSCRI gates into two words: “crawl and index.” That compression hides five separate failure modes behind a single checkbox. This piece breaks the simplistic “crawl and index” into five clear gates that will help you optimize significantly more effectively for the bots.
If you’re a technical SEO, you might feel you can skip this. Don’t.
You’re probably doing 80% of what follows and missing the other 20%. The gates below provide measurable proof that your content reached the index with maximum confidence, giving it the best possible chance in the competitive ARGDW phase that follows.
Sequential dependency: Fix the earliest failure first
The infrastructure gates are sequential dependencies: each gate’s output is the next gate’s input, and failure at any gate blocks everything downstream. 
If your content isn’t being discovered, fixing your rendering is wasted effort, and if your content is crawled but renders poorly, every annotation downstream inherits that degradation. Better to be a straight C student than three As and an F, because the F is the gate that kills your pipeline.
The audit starts with discovery and moves forward. The temptation to jump to the gate you understand best (and for many technical SEOs, that’s crawling) is the temptation that wastes the most money.

Your customers search everywhere. Make sure your brand shows up.

The SEO toolkit you know, plus the AI visibility data you need.

Start Free Trial

Get started with

Discovery, selection, crawling: The three gates the industry already knows
Discovery and crawling are well-understood, while selection is often overlooked.
Discovery is an active signal. Three mechanisms feed it: 

XML sitemaps (the census).
IndexNow (the telegraph).
Internal linking (the road network). 

The entity home website is the primary discovery anchor for pull discovery, and confidence is key. The system asks not just “does this URL exist?” but “does this URL belong to an entity I already trust?” Content without entity association arrives as an orphan, and orphans wait at the back of the queue.
The push layer (IndexNow, MCP, structured feeds) changes the economics of this gate entirely, and I’ll explain what changes when you stop waiting to be found and start pushing.
Selection is the system’s opinion of you, expressed as crawl budget. As Microsoft Bing’s Fabrice Canel says, “Less is more for SEO. Never forget that. Less URLs to crawl, better for SEO.” 
The industry spent two decades believing more pages equals more traffic. In the pipeline model, the opposite is true: fewer, higher-confidence pages get crawled faster, rendered more reliably, and indexed more completely. Every low-value URL you ask the system to crawl is a vote of no confidence in your own content, and the system notices.
Not every page that’s discovered in the pull model is selected. Canel states that the bot assesses the expected value of the destination page and will not crawl the URL if the value falls below a threshold.
Crawling is the most mature gate and the least differentiating. Server response time, robots.txt, redirect chains: solved problems with excellent tooling, and not where the wins are because you and most of your competition have been doing this for years. 
What most practitioners miss, and what’s worth thinking about: Canel confirmed that context from the referring page carries forward during crawling.
Your internal linking architecture isn’t just a crawl pathway (getting the bot to the page) but a context pipeline (telling the bot what to expect when it arrives), and that context influences selection and then interpretation at rendering before the rendering engine even starts.
Rendering fidelity: The gate that determines what the bot sees
Rendering fidelity is where the infrastructure story diverges from what the industry has been measuring.
After crawling, the bot attempts to build the full page. It sometimes executes JavaScript (don’t count on this because the bot doesn’t always invest the resources to do so), constructs the document object model (DOM), and produces the rendered DOM.
I coined the term rendering fidelity to name this variable: how much of your published content the bot actually sees after building the page. Content behind client-side rendering that the bot never executes isn’t degraded, it’s gone, and information the bot never sees can’t be recovered at any downstream gate. 
Every annotation, every grounding decision, every display outcome depends on what survived rendering. If rendering is your weakest gate, it’s your F on the report card, and remember: everything downstream inherits that grade.
The friction hierarchy: Why the bot renders some sites more carefully than others
The bot’s willingness to invest in rendering your page isn’t uniform. Canel confirmed that the more common a pattern is, the less friction the bot encounters. 
I’ve reconstructed the following hierarchy from his observations. The ranking is my model. The underlying principle (pattern familiarity reduces selection, crawl, rendering, and indexing friction and processing cost) is confirmed:
ApproachFriction levelWhyWordPress + Gutenberg + clean themeLowest30%+ of the web. Most common pattern. Bot has highest confidence in its own parsing.Established platforms (Wix, Duda, Squarespace)LowKnown patterns, predictable structure. Bot has learned these templates.WordPress + page builders (Elementor, Divi)MediumAdds markup noise. Downstream processing has to work harder to find core content.Bespoke code, perfect HTML5Medium-HighBot does not know your code is perfect. It has to infer structure without a pattern library to validate against.Bespoke code, imperfect HTML5HighGuessing with degraded signals.
The critical implication, also from Canel, is that if the site isn’t important enough (low publisher entity authority), the bot may never reach rendering because the cost of parsing unfamiliar code exceeds the estimated benefit of obtaining the content. Publisher entity confidence has a huge influence on whether you get crawled and also how carefully you get rendered (and everything else downstream).
JavaScript is the most common rendering obstacle, but it isn’t the only one: missing CSS, proprietary elements, and complex third-party dependencies can all produce the same result — a bot that sees a degraded version of what a human sees, or can’t render the page at all.
JavaScript was a favor, not a standard
Google and Bing render JavaScript. Most AI agent bots don’t. They fetch the initial HTML and work with that. The industry built on Google and Bing’s favor and assumed it was a standard.
Perplexity’s grounding fetches work primarily with server-rendered content. Smaller AI agent bots have no rendering infrastructure.
The practical consequence: a page that loads a product comparison table via JavaScript displays perfectly in a browser but renders as an empty container for a bot that doesn’t execute JS. The human sees a detailed comparison. The bot sees a div with a loading spinner. 
The annotation system classifies the page based on an empty space where the content should be. I’ve seen this pattern repeatedly in our database: different systems see different versions of the same page because rendering fidelity varies by bot.
Three rendering pathways that bypass the JavaScript problem
The traditional rendering model assumes one pathway: HTML to DOM construction. You now have two alternatives.

WebMCP, built by Google and Microsoft, gives agents direct DOM access, bypassing the traditional rendering pipeline entirely. Instead of fetching your HTML and building the page, the agent accesses a structured representation of your DOM through a protocol connection.
With WebMCP, you give yourself a huge advantage because the bot doesn’t need to execute JavaScript or guess at your layout, because the structured DOM is served directly.
Markdown for Agents uses HTTP content negotiation to serve pre-simplified content. When the bot identifies itself, the server delivers a clean markdown version instead of the full HTML page. 
The semantic content arrives pre-stripped of everything the bot would have to remove anyway (navigation, sidebars, JavaScript widgets), which means the rendering gate is effectively skipped with zero information loss. If you’re using Cloudflare, you have an easy implementation that they launched in early 2026.
Both alternatives change the economics of rendering fidelity in the same way that structured feeds change discovery: they replace a lossy process with a clean one. 
For non-Google bots, try this: disable JavaScript in your browser and look at your page, because what you see is what most AI agent bots see. You can fix the JavaScript issue with server-side rendering (SSR) or static site generation (SSG), so the initial HTML contains the complete semantic content regardless of whether the bot executes JavaScript. 
But the real opportunity lies in new pathways: one architectural investment in WebMCP or Markdown for Agents, and every bot benefits regardless of its rendering capabilities.

Get the newsletter search marketers rely on.

See terms.

Conversion fidelity: Where HTML stops being HTML
Rendering produces a DOM. Indexing transforms that DOM into the system’s proprietary internal format and stores it. Two things happen here that the industry has collapsed into one word.
Rendering fidelity (Gate 3) measures whether the bot saw your content. Conversion fidelity (Gate 4) measures whether the system preserved it accurately when filing it away. Both losses are irreversible, but they fail differently and require different fixes.
The strip, chunk, convert, and store sequence
What follows is a mechanical model I’ve reconstructed from confirmed statements by Canel and Gary Illyes.
Strip: The system removes repeating elements: navigation, header, footer, and sidebar. Canel confirmed directly that these aren’t stored per page. 
The system’s primary goal is to find the core content. This is why semantic HTML5 matters at a mechanical level.

Scroll to Top