Machines don't see your design. They read structure.
When GPTBot, ClaudeBot, or PerplexityBot hits a page, their first move is to build a document outline from headings. H1 is the topic. H2 is sections. H3 is subsections. That tree is then converted into chunks (data blocks) for RAG indexing, into snippet candidates for chat answers, into nodes in the citation graph. Broken structure, broken comprehension. It's that simple.
1. Why an LLM parser demands a clean outline#
A modern crawler for a language model works differently than a classic search bot from a decade ago. Instead of extracting keywords and counting their density, it builds the document's semantic tree and slices it into chunks of a fixed size. During this process, every chunk inherits context from its parent headings. This is called hierarchical chunking, and the entire procedure decides whether the model finds your page in response to a user query.
Three concrete things that break with bad structure:
- The outline algorithm starts hallucinating. A parser builds the document outline level by level. If you skip from
H2toH4, it inserts a phantomH3— an empty node with no name. The chunk that belongs to that phantom lands in the index without a readable heading. The model sees it as "unnamed fragment under section X" and downranks it. - The RAG ranker indexes the wrong place. Production splitters —
MarkdownHeaderTextSplitterin LangChain,HierarchicalNodeParserin LlamaIndex, or custom regex-based pipelines — anchor specifically on heading levels. TwoH1tags on one page mean two "root" documents in the index. Your content gets split. A query that should have returned one cohesive answer returns half of one. - The accessibility tree matches what AI agents see. This is the most interesting development of the last year. Agents like Claude Computer Use, OpenAI's Operator, and Claude in Chrome don't parse visual CSS — they read the accessibility tree, the same tree screen readers use for blind users. Broken structure gives an agent the same disorientation that a vision-impaired user gets. Your A11y practices now correlate directly with how well an AI agent can perform actions on your site.
Bottom line: H1 is not "the biggest font." It's the document's topic declaration for machine readers. Treat it accordingly.
2. Correct structure: one H1, sequential descent#
The rule is short: one H1 per page, no skipping levels, going back up is fine, jumping forward is not.
Here's what a correct structure looks like for a typical product page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Automation platform for engineering teams</title>
</head>
<body>
<header>
<a href="/" aria-label="Home">
<img src="/logo.svg" alt="Company logo">
</a>
</header>
<main>
<article>
<h1>Automation platform for engineering teams</h1>
<section aria-labelledby="features">
<h2 id="features">Features</h2>
<h3>Speed</h3>
<p>Orchestrates tasks in 200 ms on average.</p>
<h3>Security</h3>
<p>SOC 2 Type II, end-to-end encryption.</p>
<h4>Audit logs</h4>
<p>Export to SIEM via webhook or S3.</p>
</section>
<section aria-labelledby="pricing">
<h2 id="pricing">Pricing</h2>
<h3>Starter plan</h3>
<p>For small teams up to 5 people.</p>
<h3>Team plan</h3>
<p>For growing businesses without limits.</p>
</section>
</article>
</main>
</body>
</html>Three details to note:
- The logo in
<header>is not anH1. It's a link witharia-label. The logo repeats on every page; it doesn't describe the topic of a specific document. - The
H1lives in<main>and names the page exactly. Not "Welcome!", not "We are." A concrete topic the model can index. - The
H4is logically nested inside theH3"Security," not flying solo. You did not skip a level, because "Audit logs" is a sub-point of "Security." If you placed theH4directly underH2"Features," the parser would conclude there's an invisibleH3between them — and add a phantom node to the index.
⚠️ A note on the HTML5 spec. Formally, HTML5 allows multiple
H1tags inside sectioning content (<article>,<section>,<nav>,<aside>) — each supposedly with a "local" level. But no real browser ever implemented the outline algorithm for this case, and in 2022 W3C recommended treating headings as if the sectioning algorithm did not exist. Conclusion: oneH1per document, period. The spec be damned.
3. Visually hidden headings: when semantics matters more than design#
There's text users don't need a visible heading for — primary navigation, search forms, sidebars with filters, footers. A designer says, "It's clear from context." The machine — not. To a crawler-agent, an unnamed <nav> is just a list of links without context.
The right compromise is an sr-only (screen-reader only) heading. Present in the DOM, present in the accessibility tree, read by LLM agents and screen readers, but invisible visually.
<nav aria-label="Main navigation">
<h2 class="sr-only">Main navigation</h2>
<ul>
<li><a href="/products">Products</a></li>
<li><a href="/pricing">Pricing</a></li>
<li><a href="/docs">Documentation</a></li>
</ul>
</nav>
<aside aria-label="Catalog filters">
<h2 class="sr-only">Catalog filters</h2>
<form action="/search" method="GET">
<label for="category">Category</label>
<select id="category" name="category">
<option value="all">All</option>
<option value="tools">Tools</option>
</select>
</form>
</aside>The canonical sr-only implementation (the same one used by Tailwind CSS and Bootstrap):
.sr-only {
position: absolute;
width: 1px;
height: 1px;
padding: 0;
margin: -1px;
overflow: hidden;
clip: rect(0, 0, 0, 0);
white-space: nowrap;
border: 0;
}What you must not do here:
- ❌
display: none— removes the node from the accessibility tree. Invisible to both agents and screen readers. The heading effectively does not exist. - ❌
visibility: hidden— same problem, plus leaves an empty space in the layout. Worst option. - ❌
opacity: 0— stays in the tree but gets read by focus and breaks keyboard tab navigation. - ❌ The
clip+position: absolutetechnique — the element renders outside the visible area but is fully present in the DOM and AT.
An sr-only heading is not "an SEO crutch." It's a declaration: "a logically separate text section with a concrete topic starts here." That's exactly what the ranker reads, and what the agent uses when planning actions on the page.
Quick check for your site#
Open any page, go to the Console tab in DevTools, and run this snippet. Note that it uses modern standards (no jQuery or other deprecated libraries).
const headings = document.querySelectorAll('h1, h2, h3, h4, h5, h6');
const headingsData = Array.from(headings).map((heading, index) => {
return {
order: index,
level: heading.tagName,
text: heading.innerText.trim().slice(0, 60),
hidden:
heading.classList.contains('sr-only') ||
heading.offsetParent === null,
};
});
console.table(headingsData);Look at the level column in the resulting table. Answer three questions:
- Is there exactly one
H1on the page? - Is the descent sequential, with no skipped levels?
- Is every
H2/H3you don't see visually explicitly marked with thesr-onlyclass (and not hidden viadisplay: none)?
If the answer to any of these is "no," you have two paths. The first (correct one): rewrite the DOM so the structure mirrors the logic of the content. The second (quick band-aid): add sr-only headings where a semantic "bridge" between levels is needed, until proper refactoring catches up with the backlog.
Did your structure audit reveal problems on your site?
Follow me on LinkedIn to keep up with the technical side of optimization for AI. If your project needs architectural review, a clean semantic migration, or an AEO-aware Next.js setup, get in touch directly.