Failure report

Playwright vs Bot Arena

Seven plain Playwright tests, one per level. Each one tries to sign in. Every test fails — for different reasons across two sections. Below: what each test does, the error Playwright surfaces, and either the detection signals that caught it or the structural mismatch that prevented it.

Tests run

Failed

Signals / mismatches

Headless Chromium driven by @playwright/test running against bot-arena.jhero.app. Source: playwright/levels.spec.ts.

1. Canvas-rendered login No DOM to query — pixels only
2. Dynamic selectors Real form, randomised identifiers
3. Closed Shadow DOM Sealed web component
4. Iframe-embedded form Form in a child browsing context
5. Slider verification Drag-to-align CAPTCHA
6. Image-only labels No DOM text — labels are pixels
7. Cross-origin iframe Form on a different origin
8. Virtual scrolling Windowed list — off-screen items are absent from DOM

Section 1

Bot detection

The site detects automation through fingerprinting, behavioural signals, or third-party challenge. Five levels of increasing sophistication.

Level 1

The honest tell

· Passive webdriver flags

Playwright: failed AIVA: also fails

What the test does

playwright/levels.spec.ts:4-10 ↗

test('Level 1 sign in', async ({ page }) => {
  await page.goto('/bot-detection/level-1/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator:  getByText('Access granted')
Expected: visible
Received: hidden
Timeout:  5000ms

Plain-English explanation (click to expand)

The problem

Browsers volunteer a lot about themselves to every site they visit — what version they are, what extensions are loaded, whether they are being controlled by an automation program. When Playwright drives a browser, the browser honestly admits "I am being automated" through a flag called navigator.webdriver that any site can read in a single line of JavaScript. Stock Playwright also has no plugins installed, no notification permissions set, and identifies itself as "HeadlessChrome" in its version string. Each of these is a yes/no question a site can ask in milliseconds.

Why a VNC-driven real browser passes

The browser inside a VNC session is a regular, fully-fledged Chrome that a regular user started. Nothing is automating it from the inside — the automation happens outside the browser, at the operating-system level, by moving a mouse and pressing keys on a remote desktop. The browser does not know it is being driven, so none of these flags get set, and it reports back the same values any real human visitor would.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable One-line init-script override

Verdict: the canonical checks here are a single-line addInitScript away from passing — they are the textbook example a stealth tutorial opens with.

Each of the five remaining signals can in principle be spoofed from Playwright:

navigator.webdriver can be hidden via --disable-blink-features=AutomationControlled plus an addInitScript that redefines the property.
The User-Agent can be spoofed with --user-agent="..." to strip the HeadlessChrome token.
navigator.plugins, navigator.languages, and the Notification.permission / permissions.query pair can all be patched via Object.defineProperty in an init script.

Off-the-shelf stealth bundles (playwright-extra + puppeteer-extra-plugin-stealth) ship most of these patches already. The catch: every Chrome release introduces new tells, and commercial bot-detection vendors (Cloudflare, DataDome, PerimeterX, Imperva) maintain fingerprint databases of every known stealth-plugin signature. You spend more time updating your evasions than writing tests, and you only ever win temporarily.

AIVA context — what would need to change in AIVA to pass this (click to expand) Fixable Fixable

Path 1

Fixable

Practical — init-script patch

~30 minutes

5-line patch in browser.ts No architectural change Vibe-codable

Path 2

Fixable

Clean — replace Puppeteer / CDP

Multi-week refactor

Rewrite control plane X11/uinput steering No CDP attached

AIVA fails this level because of one signal: navigator.webdriver = true. AIVA launches Chrome via Puppeteer in aiva-node/src/control-server/src/browser.ts:204 (puppeteer.launch({...})), and any browser attached via CDP has this flag set automatically by Chrome itself.

The pragmatic fix is a single init script. Add this to AIVA's page-setup flow (e.g., next to the existing hideCursorScript wiring):

await page.evaluateOnNewDocument(() => {
  Object.defineProperty(navigator, 'webdriver', {
    get: () => undefined,
    configurable: true,
  });
});

Bot Arena's L1 check is literally navigator.webdriver === true → FAIL. Returning undefined makes the check pass. This is exactly what every stealth plugin does (puppeteer-extra-plugin-stealth, playwright-extra-stealth, etc.). The original "multi-week refactor" estimate was for the architecturally pure fix — replacing Puppeteer/CDP entirely with a non-CDP control plane. That's the right answer if you need to pass sophisticated bot-detection vendors that fingerprint the shape of navigator.webdriver (own vs prototype descriptor, getter behaviour, etc.). For Bot Arena and most "naive equality check" detection layers, the 5-line patch is sufficient.

Trade-off: the init-script patch is detectable by sites that audit property descriptors. If AIVA's target customers operate sites with enterprise-grade detection, the architectural path becomes the right long-term investment. For this demo and a wide class of real-world cases, the patch is the right answer today.

Why it failed — Detection Log

fail webdriver — navigator.webdriver = true
fail plugins — navigator.plugins.length = 0 (expected > 0)
pass languages — navigator.languages = [en-US]
fail ua-headless — User-Agent contains "HeadlessChrome/148.0.7778.96"
pass notif-permission — Notification.permission and permissions.query agreed

Visit Level 1 → Next ↓

Level 2

CDP attached

· Headless / CDP-only tells

Playwright: failed AIVA: also fails

What the test does

playwright/levels.spec.ts:12-18 ↗

test('Level 2 sign in', async ({ page }) => {
  await page.goto('/bot-detection/level-2/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator:  getByText('Access granted')
Expected: visible
Received: hidden
Timeout:  5000ms

Plain-English explanation (click to expand)

The problem

What "CDP" means: CDP stands for Chrome DevTools Protocol — the low-level remote-control interface Chrome exposes for tools like Chrome's own DevTools panel, Puppeteer, and Playwright. It is how those tools "drive" Chrome from outside the browser: clicking buttons, typing text, reading the DOM, taking screenshots — all without using a real keyboard and mouse. When CDP is attached, Chrome behaves slightly differently in measurable ways, and many of those differences also coincide with "I am running headless."

A real laptop has visible chrome around every browser window — toolbars at the top, tabs, a Windows taskbar at the bottom, a macOS menu bar — and this chrome takes up real pixels. The browser can ask the screen "how much of you is mine, versus the OS's?" and the answer comes back in pixels. A headless automated browser has no chrome and no visible window at all, so the honest answer is zero. There is no way to fake having toolbars that do not exist.

Why a VNC-driven real browser passes

A VNC session streams a real, fully visible Chrome window running on a real desktop. There are real toolbars, a real taskbar, real OS chrome. Every measurement the page makes returns the same numbers any human visitor on any laptop would produce. Crucially, the automation happens outside the browser (at the OS level, moving a real cursor) — no CDP is attached, so Chrome behaves like an ordinary Chrome being used by an ordinary person.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable Partial — script signals only

Verdict: partially patchable in script; the pixel measurements require effectively rebuilding what VNC-AIVA already is.

The JavaScript-level signals (chrome.app, chrome.csi, driver shims, toString integrity) can be polyfilled with an addInitScript at page load. Easy.

The window/screen pixel measurements are different. outerHeight - innerHeight = 0 is true because the headless browser literally has no toolbars. Two ways out, neither great:

Run headed (headless: false) on a server with Xvfb/Xvnc. But then you need a real desktop environment with a window manager and a panel to populate screen.availHeight < screen.height, plus you need Chrome to actually display its chrome (not --kiosk). At that point, you have rebuilt the AIVA architecture from scratch.
Spoof the values from JS — override window.outerHeight, screen.height, etc. via addInitScript. But the spoofs need to be internally consistent across signals: if you claim a 1080-pixel screen with a 40-pixel taskbar, the browser viewport's actual height needs to plausibly fit inside that. Cross-signal correlation catches these mismatches.

In practice: an automation team trying to fix L2 with Playwright ends up reinventing AIVA badly.

AIVA context — what would need to change in AIVA to pass this (click to expand) Fixable

Fix complexity

Fixable

Easy — drop 2 flags + add desktop env

Half a day

Config: drop 2 flags Image: add desktop env

AIVA currently fails this level for two reasons:

No visible browser chrome — AIVA's browserArgs.ts passes both --start-fullscreen and --kiosk. Both flags hide the toolbars, tabs, and address bar that any real Chrome window displays. With them dropped, outerHeight - innerHeight jumps from 0 px to the usual 80–120 px. Drop: --start-fullscreen, --kiosk
No taskbar — this one is outside Chrome's launch flags. AIVA's VNC session (Xvfb/Xvnc) has no window manager or desktop panel reserving screen pixels, so the X server reports screen.availHeight === screen.height. Adding a lightweight desktop environment to the AIVA image — XFCE, LXDE, or even just OpenBox + tint2 — with a panel/dock visible at the bottom of the screen would close this gap.

Why it failed — Detection Log

pass driver-shims — no cdc_* globals (Playwright is not Selenium)
pass tostring-integrity — Function.prototype.toString is native
fail chrome-surface — window.chrome.app and chrome.csi both missing (app=false, csi=false)
fail browser-chrome-height — outerHeight - innerHeight = 0px (no toolbars/tabs visible)
fail screen-taskbar — screen.availHeight = screen.height = 720 (no taskbar reserved)

Visit Level 2 → ↑ Previous Next ↓

Level 3

Mouse trajectory

· Behavioural — mouse path and keystroke cadence

Playwright: failed AIVA: passes

What the test does

playwright/levels.spec.ts:20-26 ↗

test('Level 3 sign in', async ({ page }) => {
  await page.goto('/bot-detection/level-3/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator:  getByText('Access granted')
Expected: visible
Received: hidden
Timeout:  5000ms

Plain-English explanation (click to expand)

The problem

When a human clicks a button on a web page, the mouse pointer travels there — left a bit, up a bit, curving naturally. That path leaves a trail of dozens of "I moved here" events along the way. Playwright does not do that. When you tell Playwright "click this button," the pointer instantly appears at the button's exact pixel and clicks. No travel, no curve. A page that records every mouse event notices that this click came out of nowhere — no human operates a computer like that.

Why a VNC-driven real browser passes

A VNC operator moves a real mouse cursor on a real operating system, generating the same continuous stream of mouse events any human would. Because the path is a physical movement (the cursor is dragged across the screen by a person or by image-recognition automation steering it), it has the same natural variation and curvature as any other user's.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable Per-interaction humanization; loses to ML defenders

Verdict: bypassable for the basic checks the arena performs, but Bezier-curve plugins alone do not defeat Cloudflare Bot Management or DataDome's behavioural model in production.

Playwright does expose lower-level mouse APIs that can generate intermediate moves:

page.mouse.move(x, y, { steps: 30 }) emits 30 intermediate mousemove events along a straight line.
Wrap that in a Bezier-curve helper with randomized jitter and you produce trajectories with the right shape and curvature.
page.keyboard.type(text, { delay: rand(80, 200) }) dispatches one key at a time with randomized inter-key delays.

The catch: every interaction in the test suite needs this treatment. A one-line page.click() becomes a thirty-line "humanize" helper. And advanced behavioural fingerprinting (used by serious bot-detection vendors) trains ML models on real human mouse telemetry — they pick up on acceleration curves, overshoot-and-correct patterns, pause-before-click latency, and dozens of other features that synthetic Bezier curves don't replicate. So: bypassable here, in this demo. Increasingly hard against production-grade defenders.

AIVA context — why this level already passes for AIVA (click to expand) ✓ passes natively

✓ No fix needed — passes by construction

AIVA passes this level natively. The mouse cursor in AIVA's VNC session moves continuously across the screen at the OS level — exactly like any human user dragging a real mouse. No code or config change is needed here; this is one of the levels where running on a real machine wins by construction.

Why it failed — Detection Log

info level3-armed — recorder armed at page load
fail mouse-trajectory — only 1 mousemove point recorded between load and click (need ≥5 for a human-shaped curve)
pass keystroke-cadence — 0 keystrokes — page.fill() bypasses key events, so this check abstains

Visit Level 3 → ↑ Previous Next ↓

Level 4

Fingerprint battery

· Canvas, audio, WebGL renderer, font set

Playwright: failed AIVA: passes

What the test does

playwright/levels.spec.ts:28-34 ↗

test('Level 4 sign in', async ({ page }) => {
  await page.goto('/bot-detection/level-4/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator:  getByText('Access granted')
Expected: visible
Received: hidden
Timeout:  5000ms

Plain-English explanation (click to expand)

The problem

Real computers have real graphics cards from real manufacturers (Intel, NVIDIA, AMD), and each draws images and text in subtly different but characteristic ways. Real computers also have real font files installed by the operating system. Headless automated browsers have neither — they use a software-only graphics stack called SwiftShader that produces an obviously-different visual fingerprint, and they ship with a stripped-down set of fonts. A page can render a tiny invisible test image and hash the pixels; that single hash is usually enough to tell whether the browser is running on real silicon or a CI runner.

Why a VNC-driven real browser passes

A VNC session runs on a real machine with a real graphics stack and a real set of fonts. The fingerprints it produces match those of millions of other real desktop Chrome installations.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable Stealth pack + per-vendor maintenance

Verdict: solvable via stealth-class plugins, with continuous cat-and-mouse against sophisticated vendors.

Canvas / audio / WebGL renderer / font spoofing is the headline feature of puppeteer-extra-plugin-stealth, playwright-extra's stealth bundle, rebrowser-patches, and Camoufox. Drop one of these into a Playwright setup and the four signals this demo measures are routinely defeated; the maintainers have already solved cross-signal consistency for the common case — a "Chrome on Windows" fingerprint package is internally coherent across canvas hash, audio waveform, WebGL renderer, and font widths.

The friction (4/5, not impossible) lives in the cat-and-mouse:

Detection vendors publish writeups identifying new tells in stealth packages and patch around them — see DataDome on stealth's iframe-contentWindow leak.
Rebrowser-patches and Camoufox ship as continuously-updated drop-ins; staying current means tracking releases on the same cadence as the detection vendors.
Higher-end defenders (Akamai) have moved primary detection to TLS-level fingerprinting (JA3/JA4), which neither stealth nor patched Playwright addresses on its own — that adds another tooling layer.

For the named vendors and most production-grade detection, a Playwright suite picks a stealth stack and accepts a perpetual maintenance overhead. Not impossible, not trivial.

AIVA context — why this level already passes for AIVA (click to expand) Fixable

Fix complexity

Fixable

Trivial — hardening (not required)

A few hours, only if hardening is desired

Config: drop 3 flags Operational: harvest denylist hashes

AIVA passes this level — but partially by accident. AIVA's browserArgs.ts includes --disable-gpu, --disable-webgl, and --disable-features=Vulkan,webgpu, which make the WebGL renderer query return nothing. Bot Arena reports an empty renderer as INFO rather than FAIL, so AIVA slips past. Canvas, audio, and font fingerprints come from a real Linux Chrome on a real machine and look like any other desktop user.

Latent risk: if Bot Arena's canvas/audio denylists in src/detections/level4.ts were populated with hashes harvested from AIVA's Chrome (which is the operational follow-up flagged in the implementation plan), this level would fail for AIVA too. Long-term, AIVA should consider whether --disable-gpu/--disable-webgl are still needed — they're a tell to fingerprint-aware sites because most real Chromes do have GPU.

Why it failed — Detection Log

fail webgl-renderer — WebGL renderer = "ANGLE (Google, Vulkan 1.3.0 (SwiftShader Device …))" — software rasteriser, no GPU
pass canvas-fp — sha256 = f66453e0… (not on denylist — denylist is empty in v1)
pass audio-fp — sha256 = 543fb8e0… (not on denylist — denylist is empty in v1)
pass font-probe — Segoe UI Emoji, Arial Black, Comic Sans MS — UA-consistent for the Windows runner

Visit Level 4 → ↑ Previous Next ↓

Level 5

Cloudflare Turnstile

· Real third-party challenge

Playwright: failed AIVA: also fails

What the test does

playwright/levels.spec.ts:36-42 ↗

test('Level 5 sign in', async ({ page }) => {
  await page.goto('/bot-detection/level-5/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator:  getByText('Access granted')
Expected: visible
Received: hidden
Timeout:  5000ms

Plain-English explanation (click to expand)

The problem

Cloudflare Turnstile is the modern, invisible replacement for "click all the bicycle pictures" CAPTCHAs. When a page asks for it, Turnstile silently runs all the kinds of checks the previous four levels illustrate — plus an additional stack of private signals only Cloudflare knows about — and decides whether the visitor looks human enough to be issued a one-time "yes, this is a human" token. For automated browsers it simply refuses to issue the token. The server-side login check then sees no token and rejects the submission before it ever reaches the application code.

Why a VNC-driven real browser passes

A real Chrome session with a real fingerprint, real interaction history, and real mouse movement looks like any other paying customer to Turnstile. The token gets issued silently, exactly the same way it would for someone working from a coffee-shop laptop.

Playwright context — could this test be fixed in Playwright? (click to expand) Impossible to fix Impossible without 3rd-party solver

Verdict: functionally impossible to bypass from inside Playwright. The only working "fix" is to outsource the problem.

Turnstile's logic is intentionally closed-source. It runs every kind of signal the previous four levels illustrate, plus a stack of private checks Cloudflare keeps to itself, plus IP reputation, plus behavioural analysis trained on the firehose of real human traffic across the Cloudflare network. Even a Playwright author who perfectly fixed levels 1-4, ran from a residential IP, and hand-rolled humanized interactions would still be classified as automated with high confidence — Cloudflare's behavioural model is too good.

The "solution" used in the wild is paid CAPTCHA-solver services (2Captcha, anti-captcha, CapMonster, etc.). They route the challenge through real-browser farms — either real humans or sophisticated stealth setups — and return a valid token in a few seconds, for a few cents each. Wire one of those into your test:

const token = await solver.solveTurnstile({
  sitekey: '0x4AAAAAADOBZMoei4aG9CNO',
  url: 'https://bot-arena.jhero.app/bot-detection/level-5/',
});
await page.evaluate((t) => {
  document.querySelector('input[name="cf-turnstile-response"]').value = t;
}, token);

This works — but it has defeated the original purpose of using Playwright. You have paid a third-party service to act as the human in front of the human-detector. Your "automated" tests now have a per-run cost and a human-in-the-loop dependency. This is exactly the kind of corner that VNC-AIVA, by being a real browser session at the OS level, avoids without any third-party dependency.

AIVA context — what would need to change in AIVA to pass this (click to expand) Fixable

Fix complexity

Fixable

Hard — partially externally-bound

Inherits L1 + infra work; Cloudflare ML remains uncertain

Blocked on L1 Blocked on L2 Residential IP infrastructure

AIVA currently fails this level as a cascading consequence of L1 and L2. Cloudflare Turnstile silently runs many of the same signals — navigator.webdriver, browser-chrome dimensions, fingerprint plausibility — plus its own private checks, plus IP reputation. Two contributing causes inside AIVA's control:

Signal leakage from L1 and L2. Fixing the Puppeteer/CDP attachment, dropping --incognito/--disable-extensions, and dropping --kiosk/--start-fullscreen would all reduce Turnstile's confidence that the visitor is automated. Closing L1 + L2 likely moves Turnstile from "refuse / interactive challenge" to "silent pass" for many sites.
IP reputation. If AIVA runs on a datacenter or cloud-region IP, Turnstile downgrades by default. Running through a residential proxy or from end-user infrastructure improves the score meaningfully — and is independent of any AIVA code change.

Turnstile's logic is partially closed-source, so even a perfectly-configured AIVA may occasionally fail. This level is the only one where success isn't fully under AIVA's control.

Why it failed — Detection Log

fail turnstile — no token — widget did not solve. Cloudflare refused to issue a token for the automated browser; server-side siteverify never called.

Visit Level 5 → ↑ Previous Next ↓

Section 2

Selector resistance

The DOM that selector-based automation depends on is absent or randomised. Playwright fails at the selector step before any signal can fire.

Level 1

Canvas-rendered login

· No DOM to query — pixels only

Playwright: failed AIVA: passes

What the test does

playwright/levels.spec.ts:46-52 ↗

test('Level 1 sign in', async ({ page }) => {
  await page.goto('/selector-resistance/level-1/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: locator.fill: Test timeout of 30000ms exceeded.

Call log:
  - waiting for getByLabel('Email')
  - locator resolved to <no element matched>

Plain-English explanation (click to expand)

The problem

A class of web apps draws part or all of their UI inside a <canvas> element — the browser sees one big rectangular pixel buffer, and everything inside (text, buttons, input boxes) is just paint. Most of the apps automation actually targets — legacy enterprise systems, custom in-house tools, AS/400-to-web emulators, signature pads, custom datepickers, charting widgets with embedded interaction — ship no accessibility tree for the canvas at all. Against those, Playwright has nothing to query: no <label>, no <input>, no text node. And the cost is binary: one canvas widget anywhere in the flow stops the whole automation, because selectors cannot skip past a step they cannot interact with.

Why a VNC-driven real browser passes

An image-aware automation tool — like the classic AIVA — does not look at the DOM at all. It looks at the rendered pixels, recognises the visible "Email" text and the box right under it, and clicks at those coordinates. It then types using OS-level keystrokes, which the canvas receives as ordinary keyboard events. The DOM's absence (or presence) is irrelevant; the pixels are the contract.

Playwright context — could this test be fixed in Playwright? (click to expand) Impossible to fix Structurally unreachable without an accessibility tree

Verdict: structurally unreachable for Playwright when the canvas surface does not expose an accessibility tree — which is the default for legacy and enterprise apps, the apps automation actually targets.

Playwright's locator engine reads the DOM. Against a canvas-only surface there is nothing to read:

page.getByLabel('Email') — no <label> element exists.
page.getByRole('textbox') — no <input> element exists.
page.getByText('Sign in') — the text "Sign in" is painted pixels, not a text node.

The blocking property is what makes this 5/5 rather than a softer 'workaround' number. A canvas-rendered surface anywhere in a workflow blocks the entire automation — selectors cannot skip past a step they cannot interact with. One signature-pad widget, one canvas-based chart that gates "continue", one canvas-rendered datepicker, and the whole test breaks. Effort per-canvas does not compose down to a small total when any single canvas means the run cannot complete.

The narrow exception: a handful of consumer-SaaS canvas apps ship a parallel accessibility-tree HTML layer for screen readers, which Playwright's getByRole / page.accessibility.snapshot() can reach. Figma and Google Sheets are the canonical examples. But Figma is rare in the apps an enterprise automation team actually targets. The bulk of canvas-in-the-wild — Photoshop Web, Photopea, tldraw, Excalidraw, Miro, Unity/Unreal WASM games, legacy AS/400-to-web emulators, custom signature pads, in-house dashboards built on D3-canvas — ship no accessibility tree at all.

The only theoretical Playwright path for the no-a11y case is to screenshot from Playwright, pipe to an external OCR / template-matching pipeline, and use page.mouse.click(x, y) at the resolved coordinates. At that point you have built a worse version of the classic AIVA — and you have moved the actual automation outside Playwright entirely.

AIVA context — why this level already passes for AIVA (click to expand) ✓ passes natively

✓ No fix needed — passes by construction

AIVA passes this level natively. AIVA's automation model is image-based from the ground up: it screenshots the visible browser surface, identifies UI elements by what they look like, and dispatches OS-level mouse and keyboard events at the right coordinates. The DOM is incidental — AIVA never touched it on the way in, so it does not matter that there is no DOM to touch here.

This level is the strongest single argument for pixel-based automation as a category. Selector-based tools are not just blocked here — they are structurally unable to attempt the task at all.

Why it failed — Detection Log

info no-dom — Only one DOM element exists in the form region: a <canvas>. No <input>, no <button>, no <label>.
fail getByLabel-email — page.getByLabel('Email') — locator resolved to <no element matched>
fail getByLabel-password — page.getByLabel('Password') — locator resolved to <no element matched>
fail getByRole-button — page.getByRole('button', { name: 'Sign in' }) — locator resolved to <no element matched>

Visit Level 1 → ↑ Previous Next ↓

Level 2

Dynamic selectors

· Real form, randomised identifiers

Playwright: failed AIVA: passes

What the test does

playwright/levels.spec.ts:54-60 ↗

test('Level 2 sign in', async ({ page }) => {
  await page.goto('/selector-resistance/level-2/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: locator.fill: Test timeout of 30000ms exceeded.

Call log:
  - waiting for getByLabel('Email')
  - locator resolved to <no element matched>

Plain-English explanation (click to expand)

The problem

Modern web apps frequently ship with build-time CSS-in-JS, which produces randomised class names. Some apps go further and randomise every attribute — id, name, class, aria-label — on every page request, and omit <label> elements altogether. A human still reads "Email" off the screen and types in the box below. A test that uses accessibility-based locators sees nothing it can grab — every locator it had hardcoded is now stale.

Why a VNC-driven real browser passes

AIVA reads "Email" from the screen pixels and clicks the input it visually identifies as a text box just below the label text. It does not look at attributes; it looks at the rendered shape of the page. Randomising the DOM has no effect on it — the visual layout is what matters.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable Semantic locators if a11y exists; brittle structural ones if not

Verdict: depends on whether the app ships accessibility metadata. With a11y, Playwright's recommended semantic locators (getByRole, getByLabel, getByText) are immune to id/class rerolling and the row is close to 2/5. Without a11y — the arena's demo deliberately strips it — only brittle structural selectors are left and it's closer to 4/5. 3/5 fits the average.

Real-world ground truth on the named examples:

CSS-in-JS (Tailwind, styled-components, Emotion): Tailwind classes are stable utility strings; Emotion hashes are stable per style definition, not per request. getByRole + accessible name handles all of these cleanly.
Anti-bot WAFs: DataDome, PerimeterX, Kasada rely on TLS+JS fingerprinting, behavioural biometrics, and proof-of-work — per-request DOM-selector randomisation is not a documented production tactic.
Ticketing / sneaker drops: Ticketmaster does have some "moving selectors" but its primary defence is randomised queue position + fingerprinting; Nike SNKRS uses telemetry + IP reputation. Selector churn happens but is rarely the load-bearing defence.

For this demo (no labels, no accessible names, all attributes rerolled), the fallbacks are:

page.locator('input[type="email"]') — works this run; breaks if the input type is also randomised, or another email input is added.
page.locator('input').nth(0) — works this run; breaks the moment the form reorders or grows.
page.locator('div:has-text("Email") + input') — works for this layout; breaks if the DOM structure is rewritten.

For an app that combines randomised attributes with stripped accessibility metadata, every fallback is one revision away from breaking; the maintenance burden grows linearly with the number of forms.

AIVA context — why this level already passes for AIVA (click to expand) ✓ passes natively

✓ No fix needed — passes by construction

AIVA passes this level natively. AIVA does not look at attributes. It looks at the visible rendering: a label that says "Email", an input box beneath it, a similar pair for "Password", a dark button labelled "Sign in". Randomising the DOM attributes changes nothing about that visual layout — the OCR and template matching find the same targets in the same places.

As a category, "selector resistance" is invisible to AIVA by construction. Every visual automation tool — AIVA, image-based RPA platforms, agentic vision models — sits in this same advantage zone.

Why it failed — Detection Log

info form-rendered — A real <form> with real <input> elements — but every id/name/class is randomised per request, and there are no <label> elements.
fail getByLabel-email — page.getByLabel('Email') fails — no <label> element associates with the input.
pass getByRole-button — page.getByRole('button', { name: 'Sign in' }) works (button text is stable).
info fallback-fragility — Even structural fallbacks like input:nth-of-type(1) work this run, but break on the next form revision.

Visit Level 2 → ↑ Previous

Level 3

Closed Shadow DOM

· Sealed web component

Playwright: failed AIVA: passes

What the test does

playwright/levels.spec.ts:62-68 ↗

test('Level 3 sign in', async ({ page }) => {
  await page.goto('/selector-resistance/level-3/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: locator.fill: Test timeout of 30000ms exceeded.

Call log:
  - waiting for getByLabel('Email')
  - locator resolved to <no element matched>

Plain-English explanation (click to expand)

The problem

Many modern web apps build their UI as Web Components — small, self-contained widgets where the internal structure is intentionally hidden from outside code. A "closed shadow root" is the strongest form of this: even your own JavaScript running on the same page cannot read or write what is inside. This is a deliberate privacy and encapsulation boundary used by component libraries, design systems, and many enterprise SaaS frontends. To a test, the component is a black box: the element exists, but the inputs, the button, and even the success message are unreachable.

Why a VNC-driven real browser passes

A vision-based automation tool screenshots the rendered page and finds the input visually. The DOM privacy boundary is irrelevant — the rendered pixels are public. AIVA reads "Email", finds the box beneath it, clicks, types — exactly like a human looking at the screen. Closed shadow roots make automation harder only for tools that look at the DOM; they make it no harder for tools that look at the screen.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable Init-script monkey-patch of attachShadow

Verdict: solvable in Playwright via a documented workaround, but with timing-sensitive caveats — and most real enterprise apps avoid closed shadow anyway.

The Playwright maintainers' own suggestion (issue #23047) is to monkey-patch Element.prototype.attachShadow in an addInitScript hook so every subsequent attachShadow({mode:'closed'}) call actually returns an open root:

await context.addInitScript(() => {
  const orig = Element.prototype.attachShadow;
  Element.prototype.attachShadow = function (options) {
    return orig.call(this, { ...options, mode: 'open' });
  };
});

Playwright's normal piercing locators then work as if the app had opted into open shadow.

The caveats are real:

The init script must land before the framework caches a reference to Element.prototype.attachShadow. Some polyfills and bundlers grab the reference at module-load time and defeat the patch.
App code that re-attaches a root via a stashed reference bypasses the patch.
A CDP fallback (DOM.querySelector with pierce: true) exists for inspection but Playwright does not surface it in Locator; you would drop to context.newCDPSession for read-only access.

Real enterprise apps overwhelmingly avoid closed shadow. Salesforce LWC uses synthetic shadow (a polyfill, fully queryable) by default; native mode uses mode: 'open'. SAP UI5 Web Components and ServiceNow Now Experience both use open shadow. Closed-mode shadow is largely a worst-case demo construction; in production, the real friction with web-component-heavy frontends is deep shadow nesting and framework-specific selector conventions, not the shadow seal itself.

AIVA context — why this level already passes for AIVA (click to expand) ✓ passes natively

✓ No fix needed — passes by construction

AIVA passes this level natively. AIVA reads the rendered page through screenshots and OCR; it has no concept of DOM accessibility at all. The shadow boundary is invisible to it because the pixels on the screen do not know they are coming from a sealed component. The form is filled, submitted, and "Access granted" is visible — same as any other login page.

As a category, web-component-heavy frontends (Salesforce Lightning, ServiceNow, SAP UI5, and most enterprise design systems) put selector-based testing in a permanent disadvantage. Vision-based tools are unaffected.

Why it failed — Detection Log

info sealed-mounted — <sealed-login> custom element with attachShadow({ mode: "closed" })
fail getByLabel-email — page.getByLabel('Email') — locator cannot pierce a closed shadow root.
fail shadow-piercer — page.locator('sealed-login >>> input') — the >>> combinator works only on OPEN shadow roots.
fail getByText-granted — page.getByText("Access granted") fails too — the success message also lives inside the sealed shadow.

Visit Level 3 → ↑ Previous

Level 4

Iframe-embedded form

· Form in a child browsing context

Playwright: failed AIVA: passes

What the test does

playwright/levels.spec.ts:70-76 ↗

test('Level 4 sign in', async ({ page }) => {
  await page.goto('/selector-resistance/level-4/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: locator.fill: Test timeout of 30000ms exceeded.

Call log:
  - waiting for getByLabel('Email')
  - locator resolved to <no element matched>

Plain-English explanation (click to expand)

The problem

A huge fraction of production websites embed third-party widgets via iframes — Stripe payment forms, Auth0 login dialogs, Cloudflare challenges, embedded support chats, social login buttons. From the page's perspective the iframe is a single rectangle; the form inside it is in a separate document with its own DOM. Playwright's standard locators (the ones every tutorial teaches) only search the main page, so they silently miss anything inside a frame. The test fails the same way as if the form did not exist.

Why a VNC-driven real browser passes

AIVA does not know or care whether a region of the screen comes from the main page or a child frame. The screenshot is one image; the form is one rectangle of pixels; the email box sits below the "Email" text. Vision-based automation traverses frame boundaries for free, because frames are a DOM concept that does not exist in the rendered image.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable One extra method call per locator chain

Verdict: solvable with one extra frameLocator call per chain. FrameLocator supports the full getBy* API and is a first-class chainable locator in Playwright.

The frame-aware version of this test:

const frame = page.frameLocator('iframe[title="login-frame"]');
await frame.getByLabel('Email').fill('user@example.com');
await frame.getByLabel('Password').fill('hunter2');
await frame.getByRole('button', { name: 'Sign in' }).click();
await expect(frame.getByText('Access granted')).toBeVisible();

The cost is one frameLocator() call at the top of each test that touches embedded content. That's modest enough to belong in the 2/5 band.

Real-world same-origin iframes in 2026 are rarer than the page implies. The classic payment / SSO examples (Stripe Elements, Adyen Web Drop-in, Braintree Hosted Fields, Auth0 Universal Login) are cross-origin by design for PCI isolation; that's a different problem covered under the cross-origin iframe row. Genuine same-origin iframe surfaces today are legacy WYSIWYG editors (TinyMCE / CKEditor classic), web-mail composers (Gmail, Outlook Web), and legacy intranet portals served from the same parent domain.

AIVA context — why this level already passes for AIVA (click to expand) ✓ passes natively

✓ No fix needed — passes by construction

AIVA passes this level natively. AIVA's screenshot includes the iframe contents because the browser composites them into the page exactly like any other element. The image-recognition pipeline sees one form, finds the inputs visually, clicks and types. Frame boundaries do not exist at the pixel level.

This is a major real-world advantage. Payment flows (Stripe, Adyen, Braintree), embedded auth (Auth0, Okta, WorkOS), and most "embedded SDK" patterns ship as iframes — usually cross-origin. Vision-based automation handles them by construction; selector-based testing handles them only after significant per-frame rewrites, and not at all when the frames are cross-origin.

Why it failed — Detection Log

info iframe-mounted — The form is in an <iframe srcdoc="..."> with its own document.
fail getByLabel-email — page.getByLabel('Email') — runs on the main frame only; the form is in a child frame so the locator never resolves.
fail getByText-granted — page.getByText("Access granted") — same problem; the message lives in the child frame.
info requires-frame-locator — Test would have to be rewritten to use page.frameLocator("iframe").getByLabel(...) — every assertion + interaction explicitly frame-scoped.

Visit Level 4 → ↑ Previous

Level 5

Slider verification

· Drag-to-align CAPTCHA

Playwright: failed AIVA: also fails

What the test does

playwright/levels.spec.ts:78-84 ↗

test('Level 5 sign in', async ({ page }) => {
  await page.goto('/selector-resistance/level-5/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator:  getByText('Access granted')
Expected: visible
Received: hidden
Timeout:  5000ms

Plain-English explanation (click to expand)

The problem

Slider CAPTCHAs are the dominant anti-bot pattern across the Chinese internet (GeeTest, NetEase, Tencent, Alibaba) and increasingly common in Western anti-bot stacks (Cloudflare interactive, AWS WAF, ticketing platforms). The user is shown a randomised image with a notched gap, and a draggable puzzle piece somewhere else. To pass, drag the piece into the gap. The position of the gap is part of an image — there is no DOM hint for where it is. A test can drag, but only a tool that can SEE the gap knows where to drag to.

Why a VNC-driven real browser passes

A vision-based automation tool screenshots the slider, finds the highlighted target zone in the image, computes its X coordinate, and dispatches an OS-level mouse drag to that exact position. The drag is real — the browser receives a real sequence of mousemove events from a real cursor. The slider sees a human-shaped gesture and unlocks the form.

Playwright context — could this test be fixed in Playwright? (click to expand) Impossible to fix Impossible without external vision

Verdict: impossible from inside Playwright. Playwright can drag (mouse.down/move/up at coordinates) but it cannot SEE the target zone.

The only paths a Playwright author has:

Take a screenshot via Playwright, pass the image to an external OCR / template-matching service, extract the target X coordinate, dispatch page.mouse.down/move/up. At that point you have built half of AIVA inside your test runner.
Use a paid CAPTCHA-solving service (similar to the Turnstile case). The service routes the challenge through real browsers, returns the solved token. Per-run cost + third-party dependency.

For real production slider CAPTCHAs (GeeTest, Alibaba, AWS WAF), the gap position is also rotated, scaled, and obfuscated with noise — generic OCR fails. Vendor-specific solver services are the only working option, and they cost ~$1-3 per 1000 solves.

AIVA context — what would need to change in AIVA to pass this (click to expand) Fixable

Fix complexity

Fixable

Moderate — add drag-and-drop primitive to AIVA

Days of work

New interaction primitive Code change

AIVA currently fails this level because it does not yet have a drag-and-drop interaction primitive. Vision recognition of the target zone is already covered by the existing screenshot pipeline; what is missing is the ability to dispatch a sustained mouse-down → mousemove sequence → mouse-up gesture as a single action.

Adding the primitive is a moderate-sized piece of work — it touches the input-dispatch layer of the VNC control plane and needs a small UX vocabulary for "drag from X to Y at speed Z" in the recorder. Once it lands, every slider CAPTCHA (GeeTest, Alibaba, AWS WAF, Turnstile interactive) and every other drag-shaped interaction (sortable lists, signature pads, file pickers with drag-in) becomes accessible at the same time.

Why it failed — Detection Log

pass inputs-fillable — Email and Password are reachable via getByLabel — those parts work.
fail slider-not-solved — The slider knob was never dragged into the target zone — verified flag stays false on submit.
fail access-granted-not-shown — expect(getByText("Access granted")).toBeVisible() times out because the form refused submission with "Blocked — verification required".

Visit Level 5 → ↑ Previous

Level 6

Image-only labels

· No DOM text — labels are pixels

Playwright: failed AIVA: passes

What the test does

playwright/levels.spec.ts:86-92 ↗

test('Level 6 sign in', async ({ page }) => {
  await page.goto('/selector-resistance/level-6/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: locator.fill: Test timeout of 30000ms exceeded.

Call log:
  - waiting for getByLabel('Email')
  - locator resolved to <no element matched>

Plain-English explanation (click to expand)

The problem

Some sites — historically many banks, brokerages, anti-scrape news sites, and some CAPTCHA prompts — render every visible label as an image, deliberately to defeat scrapers and automated tools. From a human standpoint the form looks perfectly normal: "Email" written above an empty field, "Password" written above another, "Sign in" on the button. From a test's standpoint there is no text anywhere — every "label" is a graphic with empty alt text. Accessibility-based selectors find nothing.

Why a VNC-driven real browser passes

Vision-based automation reads the rendered image with OCR exactly the way a human reads it. It sees "Email" written above a text-input-shaped rectangle and clicks. The fact that the text is an image rather than a DOM text node is invisible to OCR — they're both pixels.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable No semantic anchors; only brittle structural selectors

Verdict: only brittle structural fallbacks remain. The promise of accessibility-driven testing is gone here.

Every label is an SVG/PNG image with empty alt. Playwright's accessibility-based locators return empty:

page.getByLabel('Email') — no <label> element exists.
page.getByRole('textbox', { name: 'Email' }) — no accessible name on the input.
page.getByPlaceholder('Email') — no placeholder.
page.getByText('Email') — text is in image pixels, not a DOM text node.

Possible fallbacks:

page.locator('input').nth(0) — works this layout; breaks on the slightest reorder.
Click at hard-coded pixel coordinates via page.mouse.click(x, y) — exactly the kind of brittle, screen-resolution-dependent code that motivates moving away from selector tests in the first place.
Integrate an OCR library, OCR the screenshot, find the label position, derive coordinates — at which point your test suite has reimplemented visual automation badly.

In real production deployments (bank login keypads with shuffled-position digit images), each session also changes the layout — so even nth-child fallbacks decay across runs.

AIVA context — why this level already passes for AIVA (click to expand) ✓ passes natively

✓ No fix needed — passes by construction

AIVA passes this level natively. AIVA's primary input is the rendered screenshot, processed through OCR for text recognition. "Email" is the same to it whether it came from a DOM text node, an inline SVG, a PNG, or pixel-by-pixel canvas painting. The label-and-input-below visual pattern is recognised the same way regardless of how the page was built.

As a category, image-rendered text is everywhere in legacy financial and government software (and increasingly in anti-scraping CAPTCHAs that render even their prompt text as images). For DOM-based testing it is structurally impossible to do reliably. For vision-based automation it is no different from any other login page.

Why it failed — Detection Log

info inputs-present — Real <input> elements exist in the DOM, but they have no <label>, no aria-label, no placeholder, no title.
fail getByLabel-email — page.getByLabel('Email') — no <label> element associates with anything.
fail getByText-email — page.getByText("Email") — the text "Email" is inside an <img> as SVG, not as a text node.
fail getByRole-textbox — page.getByRole("textbox", { name: "Email" }) — no accessible name on the input.

Visit Level 6 → ↑ Previous

Level 7

Cross-origin iframe

· Form on a different origin

Playwright: failed AIVA: also fails

What the test does

playwright/levels.spec.ts:94-100 ↗

test('Level 7 sign in', async ({ page }) => {
  await page.goto('/selector-resistance/level-7/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: locator.fill: Test timeout of 30000ms exceeded.

Call log:
  - waiting for getByLabel('Email')
  - locator resolved to <no element matched>

Plain-English explanation (click to expand)

The problem

When a form is embedded in an iframe — like a Stripe payment field — the canonical Playwright pattern page.getByLabel('Email').fill(...) fails because it is scoped to the main frame, and the form is in a child frame. Same-origin policy prevents scripts running in the host page from reaching the widget's internals — but Playwright is not such a script. It operates outside the page's JS sandbox and has a dedicated frameLocator API for descending into iframes (including cross-origin ones). The arena's demo amplifies the trap by using a data: URI, which has an opaque origin that defeats URL-based frame matching; selector-based frameLocator('iframe') still works in principle.

Why a VNC-driven real browser passes

AIVA does not look at the DOM. Its input is the composited screenshot, where the browser draws cross-origin content into the same image as everything else. The Stripe card field, the iframe contents — all visible as pixels. AIVA sees the "Email" label and clicks the input below it, with no special-case code for frames.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable frameLocator + stable inner selectors

Verdict: solvable in Playwright via frameLocator. Same-origin policy restricts scripts running on the parent page, not the automation driver itself.

Playwright talks to the browser via its own protocol (CDP+ in Chromium), and Chromium reassigns the tracked session to an out-of-process iframe when one is detected. So page.frameLocator(...), page.frame(...), and locator.contentFrame() all work across origins. Stripe ships official Playwright testing patterns for filling card fields inside their iframe:

await page
  .frameLocator('iframe[name^="__privateStripeFrame"]')
  .locator('[data-elements-stable-field-name="cardNumber"]')
  .fill('4242424242424242');

The friction is real but not categorical:

Test authors must know which iframe holds the target field (selectors like iframe[name="*"] or positional indices).
Stable inner selectors are vendor-supplied — Stripe gives data-elements-stable-field-name; not every widget does.
The arena's data: URI variant is harder than typical cross-origin iframes because the opaque origin defeats URL-based frame matching, but selector-based matching still works.
Network-response interception across out-of-process iframes has known limitations (see #20809) — relevant if the test needs to inspect the iframe's traffic.

The real-world examples need correction. Auth0 Universal Login is not embedded as an iframe in production — Auth0 sets X-Frame-Options: deny, so the login flow is a top-level navigation to *.auth0.com that Playwright fills directly. Cloudflare Turnstile is a cross-origin iframe, but the friction is fingerprinting + behavioural scoring + server-side token verification, not the iframe boundary — that belongs in Bot Detection level 5. Stripe Elements is the one genuine cross-origin iframe case from the original list, and Playwright handles it routinely.

AIVA context — what would need to change in AIVA to pass this (click to expand) Fixable

Fix complexity

Fixable

Trivial — enable cross-origin iframes in AIVA launch config

Minutes

Browser config No code change

AIVA currently fails this level — but only because its embedded Chrome blocks cross-origin iframes via its launch configuration. Once cross-origin iframes are allowed in the browser config, the iframe renders normally and AIVA reads the form pixels just like any other page region. The architectural advantage is intact; only a launch-time flag stands in the way.

After the flag flips, this is a major real-world advantage. Payment forms, hosted auth dialogs, CAPTCHA challenges, embedded SDKs — all of which use cross-origin iframes by industry convention — render normally for AIVA. Selector-based testing handles them only via fragile vendor-specific workarounds, or not at all.

Why it failed — Detection Log

info iframe-cross-origin — iframe src is a data: URI with an opaque origin — cross-origin to the parent page.
fail getByLabel-email — page.getByLabel('Email') — main-frame-scoped, finds nothing.
fail frameLocator-blocked — page.frameLocator("iframe").getByLabel("Email") — Playwright refuses to script into a cross-origin frame; browser same-origin policy.
fail getByText-granted — expect(getByText("Access granted")).toBeVisible() — the message lives inside the cross-origin frame and is invisible to the parent.

Visit Level 7 → ↑ Previous

Level 8

Virtual scrolling

· Windowed list — off-screen items are absent from DOM

Playwright: failed AIVA: passes

What the test does

playwright/levels.spec.ts:102-108 ↗

test('Level 8 sign in', async ({ page }) => {
  await page.goto('/selector-resistance/level-8/');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('hunter2');
  await page.getByRole('button', { name: 'Sign in' }).click();
  await expect(page.getByText('Access granted')).toBeVisible();
});

What Playwright sees

Error: locator.fill: Test timeout of 30000ms exceeded.

Call log:
  - waiting for getByLabel('Email')
  - locator resolved to <no element matched>

Plain-English explanation (click to expand)

The problem

Performant lists in modern web apps only render the rows currently inside the visible viewport — a technique called virtual scrolling or windowing. Used by Slack's message history, Notion's database views, Gmail's thread list, every data-grid library (AG Grid, MUI X, TanStack Virtual). When a test wants to click an item that is 500 rows down, the row simply does not exist in the DOM until the list has been scrolled past it. Standard test idioms ("find the row, click it") return nothing.

Why a VNC-driven real browser passes

A vision-based automation tool already has a scroll-and-recognise loop built into its pipeline. It scrolls the visible viewport, takes a new screenshot, looks for the target visually, scrolls again, and repeats until the target appears. This is exactly what a human does. AIVA does not need to know that the list is virtualised — it does what humans do.

Playwright context — could this test be fixed in Playwright? (click to expand) Fixable Per-list scroll helper; AG Grid has an official Playwright guide

Verdict: solvable with a per-list scroll-and-wait helper. Documented in vendor docs (AG Grid publishes an official Playwright E2E guide; LSEG maintains an open-source ag-grid-playwright bridge) and in Playwright issue threads.

The canonical pattern for each virtualised list a test interacts with:

Scroll the inner container with page.evaluate(el => el.scrollTo(0, y), container) or use locator.scrollIntoViewIfNeeded() on a row anchor.
waitFor the target row to mount, then act on it.
If the list exposes keyboard navigation (react-window, TanStack Virtual), keyboard.press('ArrowDown') often works as a portable alternative.

The friction (3/5, not 4/5) lives in the rough edges: locator.count() reports only mounted rows so size assertions need a workaround (#17042); virtualisation libraries (react-window, TanStack Virtual, AG Grid) expose different scroll APIs so the helper is per-library; some apps virtualise rows AND columns. But the recipes are well-documented and the AG Grid official guide ships a setupAgTestIds helper that turns this into a solved problem.

AIVA context — why this level already passes for AIVA (click to expand) ✓ passes natively

✓ No fix needed — passes by construction

AIVA passes this level natively. Visual automation systems are built around a perception loop: screenshot, look, decide what to do, act, screenshot again. Scrolling a virtualised list is exactly the same as scrolling any other long list — AIVA scrolls a screen, looks for the target, scrolls again if needed. Whether the rows are virtualised, all-DOM, or paginated does not matter — they are all just visible rows on screen at the moment of the screenshot.

As a category, every data-heavy SaaS app (Slack, Notion, Linear, Salesforce, ServiceNow, every CRM and ERP) uses virtualisation. Selector-based testing builds an ever-growing pile of per-list scrollers; vision-based automation does not.

Why it failed — Detection Log

info list-virtualised — 1,000 logical accounts; only ~10 visible rows are mounted in the DOM at any moment.
fail getByLabel-email — page.getByLabel('Email') — no Email field exists on this page; the email is selected by clicking a row.
fail getByText-target — page.getByText("user-371@example.com") — the row for user-371 is not currently mounted; locator returns empty.
info scroll-required — A working test would have to detect virtualisation, compute the row's scroll position, scrollTop into view, then click. Requires bespoke per-list logic.

Visit Level 8 → ↑ Previous

What changes with VNC AIVA

Point the same seven tests at the classic AIVA — a real headed Chrome on a Linux host, clicked through VNC at the OS level, with image-based recognition instead of selectors. The five detection levels close down to one or two trivial fixes. The two selector-resistant levels are unblocked by construction — pixel-aware automation does not care whether the DOM is empty or randomised.

The difference is not patches, plugins, or stealth tricks. It is that VNC-AIVA is a real browser session driven by real OS-level input, with image recognition that reads the screen instead of the DOM.

Static report generated on 2026-05-12 against bot-arena commit c22e2ac. Re-run with npx playwright test.