An useful feature is Markdown for Agents. It makes an existing website “AI-friendly” by serving a Markdown representation of normal HTML pages to agents, crawlers, and LLM-based tools.
What Cloudflare’s Feature Technically Does
1. Content Negotiation Trigger
An agent or crawler sends a request like this:
Accept: text/markdown
Cloudflare sees that header on an enabled zone, fetches the normal HTML page from the origin server, converts it at the edge, and returns Markdown
instead of HTML.
The response looks roughly like this:
Content-Type: text/markdown; charset=utf-8
Vary: Accept
x-markdown-tokens: <estimated-token-count>
Content-Signal: ai-train=yes, search=yes, ai-input=yes
Cloudflare’s docs say enabled zones use content negotiation. Clients request Markdown with:
Accept: text/markdown
Cloudflare then fetches the origin HTML, converts it, and serves Markdown.
Sources:
- Cloudflare Markdown for Agents docs (https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/)
- Cloudflare announcement (https://blog.cloudflare.com/markdown-for-agents/)
## 2. HTML to Markdown Edge Conversion
The pipeline is roughly:
request
↓
if zone/path has content_converter enabled
↓
if Accept includes text/markdown
↓
fetch original page as HTML from origin
↓
preprocess DOM:
- remove nav/chrome/header/footer
- remove scripts/styles
- preserve JSON-LD only
- remove non-content junk
↓
extract meta tags into YAML frontmatter
↓
convert body DOM to Markdown
↓
append JSON-LD as fenced json block
↓
return text/markdown
Cloudflare’s documented output structure is:
1. YAML frontmatter from page metadata.
2. Markdown body converted from the document body.
3. JSON-LD preserved at the end in a fenced json block.
## 3. Frontmatter Extraction
Cloudflare maps HTML metadata into YAML frontmatter.
Example output:
---
title: ...
description: ...
image: ...
---
Those values are pulled from tags such as:
<meta name="title">
<meta property="og:title">
<meta name="description">
<meta property="og:description">
<meta property="og:image">
Standard meta fields win over OpenGraph fallbacks.
## 4. JSON-LD Handling
Cloudflare preserves structured data from:
<script type="application/ld+json">...</script>
Then it appends that data to the Markdown output like this:
json
{ … }
All other script and style content is stripped.
## 5. URL Affordances for Agents
Cloudflare docs also expose agent-friendly URLs such as:
/page/index.md
/llms.txt
/llms-full.txt
/product/llms.txt
/product/llms-full.txt
These are not necessarily part of every customer-zone Markdown-for-Agents deployment, but they are part of Cloudflare’s own “Docs for agents” system.
Source:
- Cloudflare Docs for agents (https://developers.cloudflare.com/docs-for-agents/)
## How We Would Duplicate It
The minimum viable clone is a reverse-proxy or sidecar that detects whether the requester wants Markdown.
### Request Routing
Implement a proxy that handles:
GET /some/page
Accept: text/markdown
If Accept includes text/markdown, return converted Markdown.
Otherwise, proxy the normal HTML unchanged.
Important response headers:
Content-Type: text/markdown; charset=utf-8
Vary: Accept
X-Markdown-Tokens: <count>
Content-Signal: ai-train=yes, search=yes, ai-input=yes
## Conversion Engine
Use tools like:
- HTML parser: parse5, jsdom, cheerio, or Go goquery
- Readability extraction: Mozilla Readability-style algorithm
- Markdown conversion: Turndown, html-to-md, or a rehype / remark stack
- Token counting: tiktoken or an approximate tokenizer
- Cache key: url + normalized Accept + origin ETag/Last-Modified
## Conversion Rules
Practical duplication rules:
Remove:
- script except application/ld+json
- style
- noscript
- nav
- header
- footer
- aside
- form
- cookie banners
- modals
- ads
- tracking pixels
- SVG icon sprites
- hidden elements
- empty containers
Prefer:
- main
- article
- [role=main]
- schema.org Article/Product/FAQ content
- h1-h6
- p
- ul/ol/li
- table
- blockquote
- pre/code
- img alt text
- canonical URL
## Output Format
Example:
---
title: Example Page
description: Short page summary.
image: https://example.com/cover.png
canonical: https://example.com/page
---
# Example Page
Page body converted to clean Markdown.
json
{“@context”:”https://schema.org”,”@type”:”Article”}
“`
llms.txt Support
Add:
/llms.txt
/llms-full.txt
/llms.txt should list important Markdown endpoints:
Example Site
Docs
Products
/llms-full.txt can concatenate all important pages in Markdown for bulk ingestion or RAG.
Key Difference From CAPTCHA or Bot Protection
Turnstile verifies humans.
AI Crawl Control manages crawler access.
Markdown for Agents is the “AI-friendly sidecar” piece: the same human website remains available as HTML, but agent-requested Markdown output is
served through content negotiation.