Markdown vs. HTML: How useful is it for GEO and AI visibility?

Inhaltsverzeichnis

Over the past eleven days, we’ve logged 9,822 requests from 37 different AI bots on 4eck-media.de. What’s striking is that 83 percent of these bots accessed the Markdown version of our pages, while just 12 percent accessed the standard HTML. In our view, anyone who’s serious about SEO in 2026 and still only serves HTML is missing out on a major opportunity. In this post, we’ll explain why that’s the case, why Google doesn’t want to hear about it, what Cloudflare is currently rolling out, and how you can effectively implement Markdown on your own site.

Three facts to take away
  • 83 percent of successful AI bot requests on 4eck-media.de are already delivered as Markdown; for ChatGPT users, the figure is as high as 99.1 percent. (Source: internal log files, May 5–16, 2026, 9,822 bot hits)
  • A Markdown page consumes about 80 percent fewer tokens than the original HTML, with corresponding savings in bandwidth and crawl load. (Source: Cloudflare, February 2026)
  • 20 percent of all websites worldwide run on Cloudflare, accounting for 81.5 percent of the reverse proxy market. This is precisely where Markdown for Agents has been rolling out since February 2026. (Source: W3Techs, January 2026)

What is Markdown, anyway? A brief explanation

GEO stands for Generative Engine Optimization. This refers to the optimization of websites for AI search systems such as ChatGPT, Claude, Perplexity, or Google’s AI Overviews. The underlying question is: How can I ensure that my content appears accurately in the response that a language model provides to a user?

Markdown is a simple text format. Imagine you’re writing a letter in Word and want to make some text bold, insert a heading, or create a list. Normally, you’d click on small icons to do that. In Markdown, you use a few simple symbols directly within the body text instead:

# Meine Überschrift     →  wird zu einer großen Überschrift
**wichtig**             →  wird zu fettem Text

That’s all there is to it. Plain text with a few special characters. A human can read it directly, and a computer can interpret it unambiguously. By way of comparison: When a browser loads a website, it receives HTML in return. It typically looks like this:

<div class="container-fluid bg-white shadow-lg p-4">
  <header role="banner" data-track="hero">
    <h1 class="title-xl mb-3">Willkommen</h1>
  </header>
  <nav><ul class="menu"><li><a href="...">Home</a></li>...

A browser is designed specifically for this jumble. It turns the markup into buttons, menus, cookie banners, fonts, and colors. An AI agent like ChatGPT, Claude, or Perplexity has a much harder time of it. It wants to know what the article actually says. It isn’t interested in the navigation, nor in the 14 tracking scripts. In a typical HTML page, roughly 80 percent is visual packaging and about 20 percent is content. Markdown delivers that 20 percent directly.

What our AI bot log files on 4eck-media.de show

We have been tracking 160 known AI bots and user agents on our website for some time now. During the analysis period from May 5 to May 16, 2026, the distribution was as follows:

BotrequestsMarkdown content
meta-externalagent (Meta AI)3.48197.9%
chatgpt-user (ChatGPT Live Browsing)1.40499.1 %
amazonbot59992.7 %
gptbot (OpenAI Training)63091.7 %
claudebot (Anthropic)46698.5 %
bytespider (ByteDance/TikTok)27382.8 %
oai-searchbot11342.5 %
ccbot (Common Crawl)2584.0 %
duckassistbot2085.0 %

Across all successful deliveries, 83.3 percent were sent as Markdown, 12.5 percent as HTML, and the rest were split between plain text, JSON, and a few XML feeds. In other words, the LLM bots out there make it very clear in their headers exactly what they want. They want Markdown. As soon as they can get it, they take it.

The other side of the coin is just as revealing. These bots haven’t sent a single Markdown request in our logs over the past eleven days: PetalBot (Huawei), Applebot, FacebookExternalHit, GoogleOther, and AgentReadinessScanner. They continue to load HTML. So if you want to work with bots, you need to be proficient in both.

And the trend is clearly heading in one direction. On May 5, the daily markdown rate stood at 67 percent. During the second week of May, the rate fluctuated between 86 and 94 percent. By May 16, it had reached 97 percent. The bots are quickly learning that pages support markdown and are adjusting their requests accordingly.

Side effect: Bot logging pays off twice over

Accurately logging all AI bot accesses reveals two insights that have nothing to do with the actual Markdown question, but which alone justify the effort of tracking them.

Recover forgotten URLs from old training data

AI bots regularly access URLs on 4eck-media.de that we haven’t actively linked to in years. They don’t appear in the menu, the sitemap, or internal links. Nevertheless, the bots continue to access them, presumably because the URLs are included in some training data index that was used during an earlier model training process.

In the log file, these attempts appear as 404 errors, and sometimes as 500 errors. This is exactly where the practical benefit lies: we see target URLs that we wouldn’t otherwise be aware of, and can set up clean 301 redirects to the appropriate current pages. This allows us to capture traffic that traditional SEO tools like Search Console or sitemap audits wouldn’t detect at all, because the source of these requests lies outside the open web—deep within the training snapshot of an LLM. It’s a form of reverse engineering old indexes that would simply remain invisible without bot logging.

Bot Hits zeigen neben Content-Type auch Status-Fehler an

Identifying performance issues from the bot’s perspective

The second effect concerns server speed. If a bot like GPTBot or a ChatGPT user requests a page and doesn’t receive a complete response within a short time window, it aborts the request. This appears in the log file as HTTP 499 (Client Closed Request) or 504 (Gateway Timeout). The unloaded content thus does not make it into either the training or retrieval pool. Your brand is missing from the subsequent AI response, even though it would have been relevant in terms of content.

Within the GEO community, a rule of thumb of 500 to 700 milliseconds is commonly cited as the abandonment threshold. This figure is not officially documented by OpenAI or Anthropic; it is based on practical observations. It is plausible nonetheless because it falls precisely within the transition zone where classic web performance metrics such as Time to First Byte or the Core Web Vitals also mark the difference between a good and a poor user experience. As a rule of thumb for AI crawler setup, it is therefore a useful guideline. We also reference this threshold in our 4eck GEO framework for AI visibility and AI recommendations.

Specifically, this means for your site: If you frequently see 404 or 504 errors from AI bots in your logs, you have a performance issue that often doesn’t show up in standard web analytics because humans tend to wait longer than bots. Caching, database optimization, and streamlined Markdown output (see below) all work together to address this issue.

Google says: Not necessary. Cloudflare says: Mandatory.

Google published its official guide for optimizing for generative AI features in April. In the “Mythbusting” section, it literally states that you do not need llms.txt, no special markup, and no extra Markdown output. Google’s message in essence: Stick with HTML, we understand it just fine.

A small punchline on the side: on Google’s own documentation pages, on that very page with the recommendation against Markdown, a “Copy page as Markdown” button sits right next to the heading. So Google happily delivers its own documentation as Markdown. For everyone else’s content, however, this is apparently unnecessary. Remarkable.

Markdown-Option bei Google

Why Cloudflare is the heavyweight here

Around 22.7 percent of all websites worldwide run through the Cloudflare network (as of 16 May 2026, source W3Techs). In the reverse-proxy services segment, the company holds a share of 83 percent. The trend continues upward. The next-largest provider, Amazon CloudFront, reaches 1 to 2 percent. When an infrastructure provider of this scale takes a technical position, it affects a substantial part of the public web.

With the Markdown for Agents feature, Cloudflare has been delivering every HTML page directly in Markdown on request since February 2026. On its own documentation, the company has additionally built in a notice banner right at the top that shouts at bots in capital letters: STOP! If you are an AI agent or LLM, read this before continuing. This is the HTML version of a Cloudflare documentation page. Always request the Markdown version instead. HTML wastes context.

In the same banner, Cloudflare explicitly references the llms.txt files for its own documentation. That is exactly what Google declares unnecessary in its Mythbusting section.

Two heavyweights, two opposing statements. Whom should you believe?

Our position is clear: Cloudflare has the more honest stance. Google has a strategic self-interest in website operators doing nothing that makes distributing content to other AI systems easier. Every Markdown file that ChatGPT or Claude can process efficiently is content that lands at a competitor without a Google click-through. From Google’s perspective, framing that as “not necessary” is rational.

Our own data refutes that framing pretty clearly. A 97 percent Markdown rate over the last few days is not a statistic that supports the thesis that “Markdown brings nothing”.

Alongside Markdown, we also have MCP servers and llms.txt activated and have added Content Signal directives to the robots.txt for ai-train, search and ai-input, which Cloudflare recommends. The only side effect is that PageSpeed Insights now gives us 92 instead of 100 for SEO, because it does not yet take these directives into account.

AI Ready Plugin für WordPress - entwickelt von der Agentur 4eck Media

Why Markdown makes so much technical sense

Cloudflare named a concrete number in its blog post: its own announcement article needs 16,180 tokens in HTML and only 3,150 tokens in Markdown, a saving of around 80 percent. This is relevant in practice because language models have only a limited context window. The fewer tokens a document consumes, the more fits into the context overall and the cleaner the answer becomes. An HTML document drags along a lot of ballast that is irrelevant for the meaning of the content: navigation markup, cookie banners, script tags, tracking pixels, nested wrapper divs, generic classes.

In Markdown, all of that drops away. What remains is content plus structure. Exactly what an LLM needs to answer a user question.

Markdown is the native language of the models

A property often overlooked: language models like GPT, Claude and Gemini internally produce Markdown themselves. When ChatGPT shows a bold highlight, a bullet list or a code block in an answer, the model is originally producing Markdown syntax that the chat client then renders visually. Whoever supplies a model with Markdown input is addressing it in its own working language. This has two effects: the model saves itself the internal translation of HTML into a representation that it is moving back toward Markdown anyway. And formatting gets lost less often or is less often misinterpreted.

Markdown can be cleanly quoted

AI answer systems almost always quote your content in excerpts. A paragraph here, a bullet list there, perhaps two sentences from a longer text. In Markdown, every excerpt remains intelligible on its own. A paragraph is a paragraph, a link is a link, a bullet list stays a bullet list. In HTML, every cut tears through nested tags, closing elements are missing, and the model has to guess or repair.
From a GEO perspective, that is perhaps the strongest argument for Markdown. Clean quotability raises the probability that your content gets reproduced completely in the first place, with links back to your site preserved correctly. Mangled snippets land more often in the models’ filters and disappear entirely from the final answer. Visibility in answer engines is decided at exactly this point.

Crawl budget and bandwidth

The token saving translates almost directly into bandwidth. An average HTML page on the web now weighs several hundred kilobytes according to the HTTP Archive Web Almanac, and the trend is upward. The Markdown variant of the same page typically lands at 5 to 20 kilobytes. That relieves your server, saves traffic costs and reduces the load that AI crawlers cause on your infrastructure.

In classic SEO, we speak of the crawl budget at Google, meaning how many pages Googlebot fetches from your site per time period. With AI crawlers, the behavior is comparable, only much more aggressive. In our logs alone, meta-externalagent made 3,481 requests in eleven days, gptbot another 728, chatgpt-user 1,404. If these bots can take Markdown, they pull around 80 percent less data per request. The more AI bots knock on your door, the more this effect adds up.

From this follow three measurable advantages for your visibility:

  • AI systems can process more of your content in one go, because a Markdown document is leaner.
  • The answer quality based on your content rises, because the model has to filter less junk.
  • Your hosting stays leaner, because the crawlers generate less load. Whoever gets quoted more often and more cleanly gains reach in answer engines without having to pay for more server traffic.

How to implement Markdown on your WordPress site

We implemented the Markdown output on 4eck-media.de ourselves and did not solve it via Cloudflare. The reason is simple: we wanted full control over the result and no service between our content and the bots. Whoever uses Cloudflare anyway can switch the feature on with a toggle. For those who want to take the matter in hand themselves, here is the blueprint.

The mechanism: HTTP Content Negotiation

Every bot that requests a page sends an Accept header. It contains the formats it can process. A classic browser says, for example, Accept: text/html. A modern LLM bot like ChatGPT says Accept: text/markdown, text/html. Your task is to read this header and, depending on the wish, deliver either HTML or Markdown.

Step 1: Set the hook in WordPress

The following snippet belongs in your theme’s functions file or, even better, in its own mini plugin:

add_action('template_redirect', function () {
    if (!is_singular()) {
        return;
    }

    $accept = $_SERVER['HTTP_ACCEPT'] ?? '';
    if (!str_contains($accept, 'text/markdown')) {
        return;
    }

    $post = get_queried_object();
    if (!$post instanceof WP_Post) {
        return;
    }

    header('Content-Type: text/markdown; charset=utf-8');
    header('Vary: Accept');
    header('X-Content-Format: markdown');

    echo render_post_as_markdown($post);
    exit;
});

That is the entrance door. As soon as a bot with text/markdown in the Accept header requests a single page (blog article, landing page, custom post type), your code jumps in and delivers Markdown instead of HTML. get_queried_object() is the more robust way to get to the current post inside the template_redirect hook, because the global $post variable is not yet reliably set at this point.

Step 2: Convert HTML to Markdown

For the actual conversion, we recommend the library league/html-to-markdown. Installed via Composer:

composer require league/html-to-markdown

The conversion function then looks like this:

use League\HTMLToMarkdown\HtmlConverter;

function render_post_as_markdown(WP_Post $post): string {
    $converter = new HtmlConverter([
        'strip_tags'    => true,
        'remove_nodes'  => 'script style nav footer aside',
        'header_style'  => 'atx',
    ]);

    $title    = wp_strip_all_tags($post->post_title);
    $url      = get_permalink($post);
    $updated  = get_the_modified_date('c', $post);
    $content  = apply_filters('the_content', $post->post_content);
    $markdown = $converter->convert($content);

    // Title als JSON-String escapen, das ist auch gueltiges YAML
    $frontmatter  = "---\n";
    $frontmatter .= 'title: ' . json_encode($title, JSON_UNESCAPED_UNICODE) . "\n";
    $frontmatter .= "url: {$url}\n";
    $frontmatter .= "updated: {$updated}\n";
    $frontmatter .= "---\n\n";

    return $frontmatter . "# {$title}\n\n" . $markdown;
}

The front matter at the top (the block between the dashes) is optional, but highly recommended. Bots can read metadata from it without having to parse the body. We push the title through json_encode so that quotation marks, colons or special characters do not throw the YAML parser off track.

Step 3: Do not forget caching

WordPress pages with a page cache (WP Rocket, LiteSpeed, Varnish) must respect the Vary: Accept header. Otherwise your cache delivers the same version to all bots, regardless of the Accept header that comes in. With most caching plugins, this can be activated in the settings. When in doubt, talk to your host directly.

Step 4 (optional): Offer a .md URL in parallel

Some training crawlers ignore the Accept header or send it sloppily. Whoever wants to capture those as well can offer a second URL, so /blog/my-post/ additionally reachable as /blog/my-post/index.md. The clean way to do this in WordPress is via add_rewrite_rule with your own query var:

add_filter('query_vars', function ($vars) {
    $vars[] = 'as_markdown';
    return $vars;
});

add_action('init', function () {
    add_rewrite_rule(
        '^(.+?)/index\.md/?$',
        'index.php?name=$matches[1]&as_markdown=1',
        'top'
    );
});

In the hook from step 1, you then only have to additionally check for the query parameter:

$wants_markdown =
       str_contains($_SERVER['HTTP_ACCEPT'] ?? '', 'text/markdown')
    || (int) get_query_var('as_markdown') === 1;

After adding the rewrite rule, you have to save the permalinks once in the WordPress settings so that WordPress picks up the rule.
In our logs, the relevant AI bots fetch their content almost entirely via the Accept header. Step 4 is therefore a plus, not a must.

Step 5: Optionally publish an llms.txt

An llms.txt in the root of your domain (comparable to a robots.txt) lists the most important content of your site in a form that LLMs can consume well. Cloudflare openly recommends it, Google says it is unnecessary. Our recommendation: do it, the effort is low, the damage is zero, and the possible benefit is real.

What about the risks?

A legitimate question often comes up: if I deliver Markdown, am I not making it too easy for bots to steal my content?

Whoever wants to scrape your site will scrape it anyway. That works just as well with HTML, only a little more expensive. Markdown does not prevent theft, it just makes it more efficient. What you can do instead:

  • First: use the Content Signal header in the response to make clear what you allow and what you do not. Cloudflare uses the Content Signals Framework for that. Bots that abide by it respect your specifications.
  • Second: exclude certain bots via robots.txt. That applies to HTML just as it does to Markdown.
  • Third: remember that visibility is the actual goal. Whoever does not appear in answer engines has a bigger problem than someone whose content is cleanly quoted there.

Should I use Markdown for my website?

Our data shows it clearly: AI bots want Markdown, they actively ask for it, and their share is rising week by week. Whoever delivers their content in parallel as Markdown gives bots exactly the format they can process efficiently. The result is cleaner quotations, less load on your own infrastructure and a better negotiating position for the next wave of agentic tools. Whether more visibility in ChatGPT, Perplexity, Claude and Meta AI results from this remains to be seen. I assume so, but I cannot yet provide unambiguous evidence for it. I only know that we receive new customer inquiries every week from regions across the entire DACH area that would never have reached us last year.

Google’s official recommendation to ignore all of this is, in our view, strategically motivated. It says more about Google’s own competitive situation than about the reality on the bot side. Whoever wants to use the lever should not let themselves be stopped by it.

The implementation is manageable. A plugin, a library, a few hours of development time. For Cloudflare customers, a toggle is enough. Whoever implements it themselves has control and can shape the Markdown output exactly the way it makes the most sense for their own content.

Markdown is, in our view, the coming standard language between websites and AI systems. Those who are in early benefit longer. Those who wait until Google officially confirms it pay with the cost of missed opportunities.

Frequently asked questions about Markdown on websites

Do I really need Markdown when Google itself says it is not necessary?

Google’s recommendation applies to Google’s own AI features like AI Overviews and AI Mode. For all other answer engines (ChatGPT, Claude, Perplexity, Meta AI), our data shows a clearly different picture. These systems actively request Markdown as soon as the site provides it. Whoever wants to be visible there should not leave the lever unused.

How much effort is the implementation in WordPress?

Realistically half a day to a full developer day, depending on the setup. The plugin itself is manageable (see code examples above), the effort lies in the caching configuration and in testing with different bot user agents. Whoever wants to start fully on their own can get very far with the snippets shown here.

What does Markdown for Agents cost at Cloudflare?

Currently nothing. Cloudflare offers the feature free of charge during the beta phase for Pro, Business and Enterprise plans as well as for SSL-for-SaaS customers. It is not available on the free plan.

Should I additionally set up an llms.txt?

If it is doable without much effort: yes. An llms.txt in the root lists your most important content in a structured way for LLMs. Cloudflare openly recommends it, Google advises against it. Since the effort is minimal and no damage is done, we see no reason to forgo it.

Does Markdown harm my classic SEO at Google?

No. The Markdown output is only delivered when a bot explicitly asks for it via the Accept header. Googlebot continues to get HTML, your users continue to get HTML. So no duplicate content problems arise and no conflict with existing SEO signals.

How do I measure whether the switch brings anything?

Three metrics make sense. First: evaluate bot logs (as shown in this article) to observe the Markdown share per bot over time. Second: track mentions of your brand and URLs in the major AI search systems, manually or with tools like Otterly, Peec AI or Profound. Third: look at referral traffic from AI sources in your web analytics, because ChatGPT, Perplexity and the like set clickable source links.

Does this also work with other CMS or static site generators?

Yes. The principle of content negotiation via the Accept header is CMS-independent. In Next.js or Astro it can be solved via middleware, in Laravel via a middleware handler, with static sites via edge functions or, of course, Cloudflare’s Markdown for Agents. Whoever needs more details on a specific stack is welcome to write to us.

Sources:

About the data: 9,822 bot hits from our own logging system on 4eck-media.de, period 5 to 16 May 2026, 37 active bots out of 160 tracked user agents.

Do you want to make your website AI-ready?

Wir haben das gesamte Setup auf 4eck-media.de selbst gebaut: Markdown-Auslieferung über Content Negotiation, ein Bot-Logging für 160 KI-User-Agents, Performance-Tuning auf Bot-Verhalten und die passende 301-Strategie für vergessene URLs aus Trainingsdaten. Wenn ihr ähnliches für eure Site umsetzen wollt und im eigenen Team gerade weder die Zeit noch die Tiefe dafür habt, kommen wir gern ins Spiel.

Womit wir konkret helfen:

  • Implementierung der Markdown-Auslieferung in WordPress oder anderen CMS, inklusive Caching, Vary-Header und Performance-Check
  • Aufsetzen eines KI-Bot-Loggings mit verständlichem Reporting der wichtigsten 100+ Bots
  • Auswertung eurer Logs auf 404er, 500er und 499/504-Auffälligkeiten plus passende Weiterleitungs-Strategie
  • Performance-Tuning aus Bot-Perspektive (Time to First Byte, Caching-Layer, Datenbankoptimierung)
  • GEO-Audit eurer aktuellen Sichtbarkeit in ChatGPT, Claude, Perplexity und Meta AI
  • Beratung zu llms.txt, MCP-Server, Content Signals und sinnvoller Bot-Steuerung

Nimm noch heute Kontakt mit uns auf!

Matthias Petri
Matthias Petri
UX Strategist & Technical SEO/GEO Expert

Matthias Petri is a UX/UI strategist, SEO and GEO expert, and managing director of 4eck Media GmbH & Co. KG. With over 20 years of experience in web development, technical search engine optimization, and WordPress architectures, he focuses intently on how websites are actually read, understood, and referenced by Google, AI crawlers, and generative search systems.

In his analyses, he combines technical log file evaluations, performance data, and practical implementation experience from client projects. His focus is on optimizing websites not only for humans and traditional search engines but also for AI systems such as ChatGPT, Claude, Perplexity, Meta AI, and Google AI Overviews. Topics such as Markdown delivery, bot tracking, structured content, server performance, and GEO optimization are currently among his key areas of focus.