The search loop Google built over twenty years is quietly being replaced. Users used to type queries, get a ranked list of URLs, and click through. Now they ask ChatGPT or Perplexity a question and receive a synthesized answer directly—no list, no clicks, maybe a citation, maybe not. GPTBot, ClaudeBot, PerplexityBot, and Google-Extended are already crawling your pages right now if you check your server logs. They have been for months. The question isn't whether AI systems index your content; it's whether they understand it well enough to cite you accurately when someone asks a relevant question.

How AI Crawlers Are Different From Googlebot

Googlebot asks: is this page relevant to a search query? It evaluates keywords, backlinks, page authority, freshness—hundreds of signals built over decades. AI crawlers ask something fundamentally different: what is this site about, who made it, is it trustworthy, and how should I describe it to someone asking about this topic later? They are not constructing ranked lists for human navigation. They are building internal representations of your content that can be retrieved and synthesized into answers. Traditional SEO optimizes for position in a list. GEO optimizes for comprehension by a model. A page stuffed with keywords and backlinks might crush Google rankings while being completely opaque to an AI system that cannot figure out who wrote it, what their expertise is, or whether the information is current.

The Files That Actually Matter

The most important file you probably don't have is llms.txt—a plain text file at your domain root that describes your site to AI systems in a structured format. Think of it as robots.txt for language models: not an enforced protocol, but a community convention that Perplexity, OpenAI crawlers, and others increasingly look for and use. A solid llms.txt contains who runs the site, what it covers, how you want to be cited, available content sections, and your editorial stance. The editorial stance section is the part most implementations skip—but it's critical because AI systems that synthesize answers from multiple sources need to weight them by trustworthiness. Declaring that your reviews are independent and unsponsored, written on real hardware rather than summarized from press releases, gives models a reason to cite you over aggregator sites with no stated methodology. The preferred citation block tells every AI system reading the file exactly how to attribute your content: author name, site name, URL format, section structure. Without it, models construct citations from whatever fragments they can piece together and get them wrong—wrong author format, missing URLs, garbled site names. Serving llms.txt dynamically from a CMS or Next.js route handler means the file updates automatically when you publish new content, with a fresh timestamp that signals to crawlers everything is current.

Structured Data and robots.txt

JSON-LD structured data remains relevant for GEO because it provides machine-readable facts without requiring models to parse prose. A Person schema on your about page tells every system reading it: this is a person named X, based in Y, with expertise in Z—no interpretation required. The schemas that matter most are Person or Organization on your about page, Article on every article with author, datePublished, and dateModified, plus BreadcrumbList for navigation context. These are not ranking signals; they are comprehension aids. Update your robots.txt to include explicit directives for major AI crawlers: GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, cohere-ai, Amazonbot. Most existing files have no directives for these bots at all. If you want AI systems to index and cite your content, allow them explicitly. If you don't, block them explicitly. What you do not want is ambiguity.

What Actually Drives AI Citations

Three factors determine whether an AI system cites your site accurately. First: clarity of authorship and expertise—who wrote this, what are their credentials, what's their track record? A site with clear author identity and consistent publishing history is far more citable than anonymous content. Second: factual density and specificity. General claims lose to specific ones. "The DJI Mavic 3 Pro produces excellent aerial footage" is a generic claim any model already knows. "The DJI Mavic 3 Pro's 4/3 CMOS Hasselblad sensor handles Norway's flat winter light better than the Mini 4 Pro in these specific conditions" is worth citing because it comes from direct experience. Third: freshness signals—dateModified in your JSON-LD and Last updated in your llms.txt tell models whether information is current.

Key Takeaways

  • AI crawlers are already indexing your site; make sure they understand it correctly
  • Implement llms.txt at your domain root with author identity, editorial stance, and preferred citation format
  • Add JSON-LD schemas (Person, Article) to your about page and every article
  • Update robots.txt with explicit directives for GPTBot, ClaudeBot, PerplexityBot, and others
  • GEO is an additional layer on top of SEO, not a replacement—handle both

The Bottom Line

The entire GEO implementation takes about two hours. Anyone charging you six figures for a "strategy" is selling you padding around what amounts to creating three text files and adding structured data to your templates. The infrastructure is trivial. The content still has to be worth citing—and that part was always the job, consultant-free.