---
title: Semantic HTML for Machines — Why Skipping Headings Breaks AI Optimization
description: AI crawlers parse your heading outline before anything else. Skip an H3 and you create a phantom node in the index. Here's how to keep structure clean for LLM chunkers and accessibility agents.
date: 2026-05-04
tags: [aeo, ai, html, accessibility]
---

Machines don't see your design. They read structure.

When `GPTBot`, `ClaudeBot`, or `PerplexityBot` hits a page, their first move is to build a document outline from headings. `H1` is the topic. `H2` is sections. `H3` is subsections. That tree is then converted into chunks (data blocks) for RAG indexing, into snippet candidates for chat answers, into nodes in the citation graph. Broken structure, broken comprehension. It's that simple.

## 1. Why an LLM parser demands a clean outline

A modern crawler for a language model works differently than a classic search bot from a decade ago. Instead of extracting keywords and counting their density, it builds the document's semantic tree and slices it into chunks of a fixed size. During this process, every chunk inherits context from its parent headings. This is called *hierarchical chunking*, and the entire procedure decides whether the model finds your page in response to a user query.

Three concrete things that break with bad structure:

1. **The outline algorithm starts hallucinating.** A parser builds the document outline level by level. If you skip from `H2` to `H4`, it inserts a phantom `H3` — an empty node with no name. The chunk that belongs to that phantom lands in the index without a readable heading. The model sees it as *"unnamed fragment under section X"* and downranks it.
2. **The RAG ranker indexes the wrong place.** Production splitters — `MarkdownHeaderTextSplitter` in LangChain, `HierarchicalNodeParser` in LlamaIndex, or custom regex-based pipelines — anchor specifically on heading levels. Two `H1` tags on one page mean two "root" documents in the index. Your content gets split. A query that should have returned one cohesive answer returns half of one.
3. **The accessibility tree matches what AI agents see.** This is the most interesting development of the last year. Agents like Claude Computer Use, OpenAI's Operator, and Claude in Chrome don't parse visual CSS — they read the accessibility tree, the same tree screen readers use for blind users. Broken structure gives an agent the same disorientation that a vision-impaired user gets. Your A11y practices now correlate directly with how well an AI agent can perform actions on your site.

**Bottom line:** `H1` is not "the biggest font." It's the document's topic declaration for machine readers. Treat it accordingly.

## 2. Correct structure: one H1, sequential descent

The rule is short: one `H1` per page, no skipping levels, going back up is fine, jumping forward is not.

Here's what a correct structure looks like for a typical product page:

```html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Automation platform for engineering teams</title>
</head>
<body>
  <header>
    <a href="/" aria-label="Home">
      <img src="/logo.svg" alt="Company logo">
    </a>
  </header>

  <main>
    <article>
      <h1>Automation platform for engineering teams</h1>

      <section aria-labelledby="features">
        <h2 id="features">Features</h2>

        <h3>Speed</h3>
        <p>Orchestrates tasks in 200 ms on average.</p>

        <h3>Security</h3>
        <p>SOC 2 Type II, end-to-end encryption.</p>

        <h4>Audit logs</h4>
        <p>Export to SIEM via webhook or S3.</p>
      </section>

      <section aria-labelledby="pricing">
        <h2 id="pricing">Pricing</h2>
        <h3>Starter plan</h3>
        <p>For small teams up to 5 people.</p>

        <h3>Team plan</h3>
        <p>For growing businesses without limits.</p>
      </section>
    </article>
  </main>
</body>
</html>
```

Three details to note:

- **The logo in `<header>` is not an `H1`.** It's a link with `aria-label`. The logo repeats on every page; it doesn't describe the topic of a specific document.
- **The `H1` lives in `<main>` and names the page exactly.** Not "Welcome!", not "We are." A concrete topic the model can index.
- **The `H4` is logically nested inside the `H3` "Security," not flying solo.** You did not skip a level, because "Audit logs" is a sub-point of "Security." If you placed the `H4` directly under `H2` "Features," the parser would conclude there's an invisible `H3` between them — and add a phantom node to the index.

> ⚠️ **A note on the HTML5 spec.** Formally, HTML5 allows multiple `H1` tags inside sectioning content (`<article>`, `<section>`, `<nav>`, `<aside>`) — each supposedly with a "local" level. But no real browser ever implemented the outline algorithm for this case, and in 2022 W3C recommended treating headings as if the sectioning algorithm did not exist. Conclusion: one `H1` per document, period. The spec be damned.

## 3. Visually hidden headings: when semantics matters more than design

There's text users don't need a visible heading for — primary navigation, search forms, sidebars with filters, footers. A designer says, "It's clear from context." The machine — not. To a crawler-agent, an unnamed `<nav>` is just a list of links without context.

The right compromise is an `sr-only` (screen-reader only) heading. Present in the DOM, present in the accessibility tree, read by LLM agents and screen readers, but invisible visually.

```html
<nav aria-label="Main navigation">
  <h2 class="sr-only">Main navigation</h2>
  <ul>
    <li><a href="/products">Products</a></li>
    <li><a href="/pricing">Pricing</a></li>
    <li><a href="/docs">Documentation</a></li>
  </ul>
</nav>

<aside aria-label="Catalog filters">
  <h2 class="sr-only">Catalog filters</h2>
  <form action="/search" method="GET">
    <label for="category">Category</label>
    <select id="category" name="category">
      <option value="all">All</option>
      <option value="tools">Tools</option>
    </select>
  </form>
</aside>
```

The canonical `sr-only` implementation (the same one used by Tailwind CSS and Bootstrap):

```css title="sr-only.css"
.sr-only {
  position: absolute;
  width: 1px;
  height: 1px;
  padding: 0;
  margin: -1px;
  overflow: hidden;
  clip: rect(0, 0, 0, 0);
  white-space: nowrap;
  border: 0;
}
```

What you must **not** do here:

- ❌ `display: none` — removes the node from the accessibility tree. Invisible to both agents and screen readers. The heading effectively does not exist.
- ❌ `visibility: hidden` — same problem, plus leaves an empty space in the layout. Worst option.
- ❌ `opacity: 0` — stays in the tree but gets read by focus and breaks keyboard tab navigation.
- ❌ **The `clip` + `position: absolute` technique** — the element renders outside the visible area but is fully present in the DOM and AT.

An `sr-only` heading is not "an SEO crutch." It's a declaration: *"a logically separate text section with a concrete topic starts here."* That's exactly what the ranker reads, and what the agent uses when planning actions on the page.

## Quick check for your site

Open any page, go to the Console tab in DevTools, and run this snippet. Note that it uses modern standards (no jQuery or other deprecated libraries).

```js title="audit-headings.js"
const headings = document.querySelectorAll('h1, h2, h3, h4, h5, h6');
const headingsData = Array.from(headings).map((heading, index) => {
  return {
    order: index,
    level: heading.tagName,
    text: heading.innerText.trim().slice(0, 60),
    hidden:
      heading.classList.contains('sr-only') ||
      heading.offsetParent === null,
  };
});

console.table(headingsData);
```

Look at the `level` column in the resulting table. Answer three questions:

1. Is there exactly one `H1` on the page?
2. Is the descent sequential, with no skipped levels?
3. Is every `H2`/`H3` you don't see visually explicitly marked with the `sr-only` class (and not hidden via `display: none`)?

If the answer to any of these is "no," you have two paths. The first (correct one): rewrite the DOM so the structure mirrors the logic of the content. The second (quick band-aid): add `sr-only` headings where a semantic "bridge" between levels is needed, until proper refactoring catches up with the backlog.

---

**Did your structure audit reveal problems on your site?**

Follow me on [LinkedIn](https://linkedin.com/in/alexturik) to keep up with the technical side of optimization for AI. If your project needs architectural review, a clean semantic migration, or an AEO-aware Next.js setup, [get in touch](mailto:alexturik@gmail.com) directly.
