methodology

How we measure AI visibility (AVI)

This page explains exactly how a run is produced: entity resolution → prompt pack → surface runs → evidence gating → bucketing → scoring → confidence → diagnostics. No marketing filler.
GEO (Generative Engine Optimization) is commonly used as the umbrella term for improving visibility in generative engine responses.

1) What counts as “visibility” in this audit

We measure inclusion for high-intent prompts, and we only count hits that match the selected entity.

Visibility event

A “hit” is not just a name match

A run starts by resolving your entity (the exact business you choose). During scoring, we only count a “hit” when the output can be attributed to that entity with high confidence (domain / website match, or strong identity match).

  • Hard match: cited URL / website / domain matches the chosen entity.
  • Strong soft match: brand name + address/city context aligns to the chosen entity and does not conflict with a competitor.
  • Rejected: generic name mention with no supporting identity, or clear competitor mismatch.
This “evidence gating” is the whole point: avoid counting hallucinated wins.

2) Entity resolution (the “Find my business” step)

We force a single entity selection before we run prompts.

Why this exists

Most local brands fail because identity is fuzzy

If the system can’t distinguish you from a similarly named business, it will default to safer, stronger entities. Entity resolution makes the rest of the run interpretable.

What we store

Canonical entity object

The selected entity is recorded (name, website/domain, location signals). That becomes the reference for evidence gating later.

3) Prompt pack

Standardized, repeatable prompts designed to reflect buying intent.

What we run

Industry + location intents

Prompts are variations of “best {service} in {location}” (plus nearby intent phrasing). Pack size controls how many prompts are tested.

Why packs beat single prompts

We’re measuring patterns, not luck

One prompt can be noisy. A pack shows whether you’re consistently included for the market you care about.

4) Surface runs

We run the same prompt pack across multiple generative surfaces.

Surfaces

Why multi-surface matters

Different systems retrieve, rank, and cite differently. Measuring more than one avoids building strategy on one model’s quirks.

Google AI Overviews (if enabled)

AIO is context-dependent

Google’s AI features can appear selectively depending on the query and context, and are intended to include links for deeper exploration. :contentReference[oaicite:2]{index=2}

5) Bucketing (turning outputs into stable signals)

We reduce noise by using rank buckets instead of fragile exact ordering.

Buckets

Top 3 / Top 5 / Top 10 / Not present

For each prompt+surface we assign a bucket. This produces stable aggregates even when outputs shuffle slightly.

  • TOP_3: included in the strongest recommendations
  • TOP_5: included, but not in the top cluster
  • TOP_10: present, but weaker placement
  • NOT_PRESENT: not recommended / not attributable
Evidence gating

We only bucket after attribution

If attribution fails (can’t tie the mention to your entity), the bucket becomes NOT_PRESENT. This prevents inflated scores from ambiguous name matches.

6) Scoring (0–100)

The score is computed from bucket weights, then adjusted by coverage.

Bucket weights

How buckets become numbers

We use consistent weights so the score is explainable:

TOP_3      = 1.00
TOP_5      = 0.70
TOP_10     = 0.40
NOT_PRESENT= 0.00
          
(If your engine includes an “unranked mention” state, it should be treated as a low weight and still require attribution.)
Per-surface score

Average across prompts

surface_score = average(bucket_weight over prompts)
          
Overall base score

Weighted blend of surfaces → 0–100

base_0_100 = 100 * weighted_average(surface_score_i)
          
Surface weights can be equal by default; if you later add AIO as a Pro surface, you can choose a smaller weight to reflect volatility. :contentReference[oaicite:3]{index=3}
Coverage adjustment

Don’t reward a single-surface spike

Coverage = how many selected surfaces include you at least once. We apply a multiplier so “present on 1/3 surfaces” can’t look like a dominant win.

coverage_multiplier:
  1/3 surfaces present -> 0.50
  2/3 surfaces present -> 0.75
  3/3 surfaces present -> 1.00

AVI = round(base_0_100 * coverage_multiplier)
          

7) Confidence

Confidence is measurement quality, not a guarantee.

What raises confidence
  • Entity match is unambiguous (domain/website matches consistently)
  • Results are consistent across prompts
  • Citations/links exist (where surfaces provide them)
What lowers confidence
  • Ambiguous brand name (shared names, franchises, duplicates)
  • High variance across prompts
  • Surfaces that are context-dependent / volatile (ex: AI features that appear selectively)

8) Diagnostics (site checks)

These are trust-signal checks tied to discoverability and attribution.

What we check

Entity clarity + machine readability

  • About/Contact clarity (can a crawler understand who you are?)
  • NAP consistency (name/address/phone) alignment with public listings
  • Structured data where applicable (Organization / LocalBusiness)
  • Basic crawlability signals (indexability, canonical patterns, obvious blockers)
Why structured data matters

Disambiguation

Google explicitly frames Organization/LocalBusiness structured data as a way to help systems understand and disambiguate an entity.

Optional

/llms.txt

Some sites publish an /llms.txt file as a helper for LLM/agent consumption. It’s not a guarantee of inclusion, but it can make your canonical docs easier to find.

9) Outputs (PDF + artifact)

PDF is for humans; the artifact is for auditability.

PDF report

Shareable summary

The PDF contains: overall score, surface breakdown, coverage/confidence, and prioritized fixes.

Artifact (downloadable JSON)

What’s in it

The artifact exists so the result isn’t “trust me bro.” It should include:

  • Run metadata (timestamp, channels, pack size, location hints)
  • Chosen entity (name, domain/website, location signals)
  • Prompt pack (exact prompts used)
  • Raw outputs per prompt+surface (or extracted fields where raw storage is restricted)
  • Attribution decisions (why a mention counted / didn’t count)
  • Bucket assignments + scoring inputs
  • Diagnostics results (PASS/WARN/FAIL + evidence)

Glossary (short, actually useful)

Terms you’ll see in the report.

AVI

AI Visibility Index

A 0–100 measurement of how often your business is included across selected AI surfaces for a standardized prompt pack.

GEO

Generative Engine Optimization

Improving visibility in generative engine responses; “visibility metrics” and evaluation are central to the academic framing.

Coverage

Breadth across surfaces

How many selected surfaces include you at least once. Used to avoid overvaluing a single-surface spike.

Confidence

Measurement quality

How stable and attributable the measurement is (not a promise of outcomes).

Evidence-gated

Attribution first, scoring second

We only score a mention after we can tie it to the chosen entity (domain/identity match). No ambiguous wins.

Known limitations

What this can’t do (and why that’s fine).

  • Generative outputs can vary by time, location, and product changes.
  • Some AI features (like AI Overviews) can appear selectively depending on context.
  • This is a measurement harness: it makes improvement testable; it doesn’t control the platforms.
Sources used for terminology and platform framing: GEO paper (Aggarwal et al.), Google Search Central AI features guidance, Google structured data docs, llms.txt proposal.