Bot Arena

Other use-cases

Where pure Playwright still falls over

Bot detection is one well-known wall. There are several more — patterns where a human user finishes the task in seconds, the natural Playwright assertion is the same one any author would write, and the test still fails 10 out of 10 runs.

Cases tested
8
Failed 10/10
8
Distinct capability gaps
4

Headless Chromium driven by @playwright/test v1.49. Each spec was rerun ten times to distinguish stable failure from flake. All cases below fail 10/10 — they are capability gaps, not flakes.

1. Canvas & WebGL — what the user sees is not in the DOM

When a library renders to <canvas> or WebGL, the rendered text, shapes and colors live as pixels, not as DOM nodes. Playwright's getByText sees nothing.

Failed 10/10 Case 1

Canvas charts (Chart.js, ECharts)

· bar chart value reading

What the test does

// Renders a Chart.js bar chart with values [40, 45, 60, 50, 35]
// for Mon..Fri. The user expects to see "Wednesday" and "60".
await expect(page.getByText('Wednesday', { exact: true })).toBeVisible();
await expect(page.getByText('60', { exact: true })).toBeVisible();

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator: getByText('Wednesday', { exact: true })
Expected: visible
Error: element(s) not found

Page snapshot:
  - heading "Weekly hours" [level=1]
Plain-English explanation (click to expand)
The problem

Both Chart.js and Apache ECharts render bar charts by drawing pixels into an HTML <canvas> element. The axis labels ("Monday", "Wednesday") and data labels ("60") are part of that picture — they are not text nodes anywhere in the DOM. A human reading the chart sees them; Playwright querying the DOM does not. The page snapshot shown to Playwright contains only the <h1>, because that is everything the DOM actually has.

Why a visual / AIVA approach helps

A visual agent looks at the rendered screenshot the same way a human does. Optical character recognition (or a multimodal model) extracts "Wednesday" and "60" from pixels. The assertion can be expressed in user terms — "the Wednesday bar shows 60 hours" — and verified against what is on screen.

SVG counter-example

Switching the same chart to Highcharts (which renders to SVG) makes Playwright work normally — SVG <text> nodes are real DOM. The split is canvas vs SVG, not "charts in general".

Reproduce with another automation tool

Target https://echarts.apache.org/examples/en/editor.html?c=bar-simple

Goal — Read the value of one bar (e.g., "Wed") from the rendered chart, by selector only — no screenshot, no OCR.

  1. Open the URL. The right pane shows a live ECharts bar chart with weekday labels and seven bars.
  2. Wait until the chart finishes rendering.
  3. Without taking a screenshot, locate the DOM node that contains the visible text "Wed" or any of the bar values (120, 200, 150, 80, 70, 110, 130).
  4. Read the value associated with the "Wed" bar from the DOM.

Expected — No DOM node contains the label "Wed" or the numeric values — they are all painted into <canvas>. A scripted tool fails the lookup; a visual tool reads them straight from the rendered image.

Failed 10/10 Case 2

WebGL 3D scene (Three.js)

· cube visibility check

What the test does

// threejs.org example renders a rotating textured cube.
// The user can see it clearly; Playwright reads the center pixel.
const px = await canvas.evaluate(c => {
  const gl = c.getContext('webgl2') || c.getContext('webgl');
  const out = new Uint8Array(4);
  gl.readPixels(c.width/2, c.height/2, 1, 1,
                gl.RGBA, gl.UNSIGNED_BYTE, out);
  return [out[0], out[1], out[2]];
});
expect(px[0] + px[1] + px[2]).toBeGreaterThan(0);

What Playwright sees

Error: the center pixel of the canvas
should be the visible cube, not blank black

expect(received).toBeGreaterThan(expected)
Expected: > 0
Received:   0
Plain-English explanation (click to expand)
The problem

The natural way to confirm a 3D viewer is rendering is to read the canvas pixels. Three.js — and almost every other WebGL library — uses preserveDrawingBuffer: false by default. The browser is allowed to wipe the buffer between frames, so gl.readPixels() from outside the render loop returns all zeros even when the scene is visibly drawn on screen. The cube is there for a human; from Playwright's vantage point the canvas is black.

Why a visual / AIVA approach helps

A screenshot of the actual rendered page does not go through the WebGL context — it captures the compositor output, the same thing your monitor displays. A visual agent has the rendered cube to work with directly, in colour.

Reproduce with another automation tool

Target https://threejs.org/examples/webgl_geometry_cube.html

Goal — Confirm that a textured 3D cube is rendered and visible on the page (not a blank or error scene).

  1. Open the URL. A standalone Three.js example loads — the page is mostly empty except for a single full-window <canvas> element.
  2. Wait 1–2 seconds for the first frame to render.
  3. Without taking a screenshot of the page, ask the canvas via gl.readPixels() what colour the center pixel is.
  4. Alternatively, look for any DOM text that describes the rendered geometry (a label, badge, or status pill saying "cube" or showing its colour).

Expected — The cube is plainly visible to a human; gl.readPixels() returns 0,0,0 because Three.js does not preserve the drawing buffer. No DOM text describes the rendered scene. A scripted tool cannot confirm what is on screen; a visual tool sees the cube directly.

Failed 10/10 Case 3

WebGL maps (Mapbox GL)

· "the map shows Brno"

What the test does

// Centers a Mapbox GL map on Brno at zoom 12.
// A user looking at the map sees the city name and roads.
await expect(page.getByText('Brno', { exact: false })).toBeVisible();

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator: getByText('Brno', { exact: false })
Expected: visible
Error: element(s) not found
Plain-English explanation (click to expand)
The problem

Mapbox GL JS (and similar libraries like Cesium and Deck.gl) paint streets, place names and POI labels directly into a WebGL canvas from vector-tile data. None of the human-readable text on the map exists in the DOM. The same flow on a non-WebGL library — say, Leaflet with raster tiles and HTML markers — would work fine, because labels there are real DOM nodes.

Why a visual / AIVA approach helps

Vision sees the map the way the user does. "Verify the city name 'Brno' appears near the centre" becomes a screenshot crop plus text recognition — exactly how a tester would describe the check verbally.

Reproduce with another automation tool

Target https://docs.mapbox.com/mapbox-gl-js/example/simple-map/

Goal — Read any place name that is visible on the rendered map (city, street, country label).

  1. Open the URL. A live Mapbox GL map is embedded in the docs page, centered on Helsinki by default.
  2. Wait for the tiles to load.
  3. Look at the map: human-readable labels for cities, water bodies and streets are clearly visible.
  4. Try to find any of these labels — "Helsinki", "Vantaa", "Baltic Sea", a street name — via DOM text search.

Expected — The surrounding documentation text ("Display a map on a webpage", code snippets, side nav) is in the DOM. Every label drawn on the map itself is not — it is rasterised into the WebGL canvas from vector tiles. A scripted tool can verify the page chrome but never the map content.

2. Automation detection — the browser admits it is a robot

The same family of detections Bot Arena demonstrates also blocks Playwright from interacting with any portal that is fronted by a serious bot manager.

Failed 10/10 Case 4

Public fingerprint pages (sannysoft, CreepJS)

· no automation tells

What the test does

// bot.sannysoft.com prints a pass/fail table.
// A real user has zero "failed" rows.
await page.goto('https://bot.sannysoft.com/');
const failingCells = page.locator('td.failed');
await expect(failingCells).toHaveCount(0);

What Playwright sees

Error: no bot-detection check should
report failed for a real user

expect(locator).toHaveCount(expected)
Expected: 0
Received: 13
Plain-English explanation (click to expand)
The problem

This is the same family of detections Bot Arena demonstrates — navigator.webdriver, missing chrome.runtime, HeadlessChrome in the user agent, software-only WebGL renderer, and so on. Public test pages like sannysoft and CreepJS list them out openly: thirteen of them fail for plain Playwright. The same arithmetic plays out silently inside Cloudflare, Akamai or PerimeterX in front of real customer portals.

Why a visual / AIVA approach helps

A VNC-driven real Chrome is a real Chrome — none of the thirteen flags fire because the browser is not under automation control from the inside. The Bot Arena failure report shows this on five layered scenarios.

Live snapshot — what Playwright actually faces
Live screenshot of bot.sannysoft.com taken against Playwright-driven Chromium, showing twelve detection cells highlighted red ("failed") and one highlighted yellow ("warn") across the two test tables.

Captured against headless Chromium driven by @playwright/test: 12 failed + 1 warn = 13 detection cells lit up. The same browser opened by a human on a desktop shows none of these.

Open full size →

Reproduce with another automation tool

Target https://bot.sannysoft.com/

Goal — Pass the public bot-detection battery — every check should report "passed", as it does for a regular human visitor.

  1. Open the URL.
  2. The page renders two tables ("Intoli.com tests + additions" and "Fingerprint Scanner tests") where each row is one check.
  3. Wait ~2 seconds for all checks to settle.
  4. Count the table cells styled red ("failed"). For a real human Chrome this is zero or close to it.

Expected — A real headed Chrome session passes virtually all checks. A scripted browser (Playwright, Puppeteer, Selenium) lights up roughly thirteen red cells — navigator.webdriver, HeadlessChrome in the UA, missing chrome.runtime, software-only WebGL renderer, and more. These are the same signals real bot-management services use.

Failed 10/10 Case 5

Google reCAPTCHA v2 demo

· submit a protected form

What the test does

// The official reCAPTCHA demo form.
// A human ticks the checkbox and submits in 1 second.
await page.goto('https://www.google.com/recaptcha/api2/demo');
const captcha = page.frameLocator('iframe[title="reCAPTCHA"]');
await captcha.locator('#recaptcha-anchor').click();
await page.locator('#recaptcha-demo-submit').click();
await expect(page.getByText(/Verification Success/i)).toBeVisible();

What Playwright sees

Error: expect(locator).toBeVisible() failed

Locator: getByText(/Verification Success/i)
Expected: visible
Error: element(s) not found
(image-challenge presented instead of pass)
Plain-English explanation (click to expand)
The problem

reCAPTCHA's silent path is exactly the same trust check as Cloudflare Turnstile (Bot Arena Level 5). When the checkbox click comes from a browser that has tripped the automation tells, Google falls back to an image challenge — pick the bicycles, etc. — that Playwright cannot solve. The token is never issued and the demo form never reports success.

Why a visual / AIVA approach helps

Same reason as Turnstile: a real Chrome session driven via VNC has a real fingerprint, real mouse trajectory, and real history. reCAPTCHA hands it the silent pass and the form submits without an image challenge ever appearing.

Reproduce with another automation tool

Target https://www.google.com/recaptcha/api2/demo

Goal — Submit the demo form and reach the "Verification Success" page — the same outcome a human gets in two clicks.

  1. Open the URL.
  2. Click the "I'm not a robot" checkbox (it lives inside an iframe titled "reCAPTCHA").
  3. Wait for the spinner to resolve.
  4. Click the "Submit" button at the bottom of the form.
  5. Verify the next page displays the text "Verification Success".

Expected — A real human passes silently — the checkbox turns into a green tick and Submit goes through. An automated browser gets an image challenge ("select all squares containing bicycles"), which is unsolvable without external vision/captcha-solver services. The Submit either never accepts the request or the result page never shows "Verification Success".

3. Media content — what is in this video?

Failed 10/10 Case 6

Video content recognition (YouTube embed)

· "is the right video playing?"

What the test does

// Embedded corporate-training video. The user verifies it
// shows the correct speaker / scene.
await page.goto('https://www.youtube.com/embed/dQw4w9WgXcQ?autoplay=1&mute=1');
// Best Playwright can do is "is it playing?":
const state = await page.locator('video').evaluate(v =>
  v.currentTime > 0 ? 'playing' : null
);
expect(state).toMatch(/Rick Astley|singer|musician/i);

What Playwright sees

Error: we should be able to describe
what is shown in the video

Expected pattern: /Rick Astley|singer|musician/i
Received string:  "playing"
Plain-English explanation (click to expand)
The problem

The HTML <video> element exposes currentTime, duration and paused — nothing about what is being decoded into the visible frame. A test author who wants to verify "the right training video is playing", "the speaker has appeared", or "the closing logo is shown" cannot get that from the DOM. The same applies to live-streaming dashboards, video-conferencing tiles, screen-share previews and CCTV grids.

Why a visual / AIVA approach helps

A frame screenshot plus a vision model answers the question directly: "describe what is on screen". The assertion can be the same plain English the customer used in the bug report.

Reproduce with another automation tool

Target https://www.youtube.com/embed/dQw4w9WgXcQ?autoplay=1&mute=1

Goal — Verify that the playing video shows a person performing/singing (not, say, a still title card or an error frame).

  1. Open the URL — the YouTube embed autoplays muted.
  2. Wait 3 seconds so the player is past any title card and into the music video.
  3. Without sampling a screenshot, ask the page what is visible in the video frame — a person, an outdoor scene, a logo, an error overlay?

Expected — The <video> DOM element only exposes currentTime, duration and paused. The visible content of the decoded frame is not accessible. A scripted tool can only confirm "the video is playing"; it cannot confirm "the right video is playing".

4. UI timing & selector fragility

Two failures where the rendered page is fully DOM-accessible — but the natural Playwright flow still breaks, because synthetic input and synthetic selectors do not match what the application actually responds to.

Failed 10/10 Case 7

Keystroke race against bpmn-js label commit

· demo.bpmn.io workflow modeler

What the test does

// Create a task on the BPMN canvas, name it "Process order",
// save and verify the exported XML.
await page.locator('.djs-palette [title="Create task"]').click();
await page.mouse.click(centerX, centerY);
await page.keyboard.type('Process order');
await page.keyboard.press('Escape');
// Ctrl+S triggers the BPMN file download:
const xml = await readDownload();
expect(xml).toContain('Process order');

What Playwright sees

Error: expect(received).toContain(expected)

Expected substring: "Process order"
Received string: ...
  <bpmn:task id="Activity_00g..."
             name="Proces" />
Plain-English explanation (click to expand)
The problem

The Ctrl+S download actually succeeds — bpmn-js exports valid XML. The failure is more surprising: only "Proces" was recorded as the task name, not "Process order". page.keyboard.type fires synthetic events as fast as Chromium will accept them, with no inter-key delay. bpmn-js's text-input overlay debounces its label commit against the next blur/Escape — and Playwright's Escape arrives before the last few characters of "Process order" have made it through the debounce. A human typing at human speed never has this race.

Why a visual / AIVA approach helps

OS-level key events are paced by the real input pipeline. The same flow driven through VNC arrives at the application at human cadence, so debounce-style commits get the full string and the XML reflects what the user typed.

Reproduce with another automation tool

Target https://demo.bpmn.io/new

Goal — Create a BPMN task on the empty canvas, name it exactly "Process order", and export valid BPMN XML containing that name.

  1. Open the URL. An empty BPMN canvas appears with a vertical palette on the left edge.
  2. Dismiss the Camunda cookie consent banner if it appears.
  3. In the left palette, click the "Create task" entry (rectangle with a plus icon).
  4. Click once on the canvas centre to drop the task.
  5. A label input is auto-focused on the new task. Type "Process order" (13 characters including the space).
  6. Press Escape to commit the label.
  7. Click anywhere on the canvas, then press Ctrl+S — bpmn-js intercepts this and triggers a .bpmn file download.
  8. Open the downloaded XML and search for the <bpmn:task> element. Verify its name attribute equals "Process order".

Expected — The XML download itself works. A human typing at human cadence sees the task labelled "Process order". A scripted tool firing synthetic keypresses with no inter-key delay races bpmn-js's debounced commit — the exported XML shows name="Proces" (truncated). Adding artificial typing delay masks the bug rather than fixing it.

Failed 10/10 Case 8

Odoo invoice form (full flow)

· customer + draft + post

What the test does

// Fresh Odoo demo tenant. Create a customer, then a Customer
// Invoice with one line, post it. Assert status = "Posted".
await page.locator('div[name="partner_id"] input').click();
await page.locator('div[name="partner_id"] input').fill(customerName);
await page.getByRole('menuitem',
  { name: new RegExp(customerName) }).first().click();
await page.getByText('Add a line').click();
await page.keyboard.type('Consulting services');
// ... save, confirm, expect Posted

What Playwright sees

TimeoutError: locator.click: Timeout 15000ms

Call log:
  waiting for getByRole('menuitem',
    { name: /ACME\s+Corp\s+1778611142987/ }).first()
  (autocomplete never surfaced the matching entry)
Plain-English explanation (click to expand)
The problem

Odoo's web client is built on Owl, a reactive framework that mutates DOM around generated IDs and per-customer module configuration. Selectors that work today break on the next release; selectors that work on demo tenant A do not work on customer tenant B; the customer autocomplete in the invoice form is fed asynchronously and a synthetic fill() often does not trigger the surface of the matching menuitem in time. The same trained accountant who finishes this flow in thirty seconds never hits any of this.

Why a visual / AIVA approach helps

A visual driver targets what the user sees — the field labelled "Customer", the dropdown row whose visible text is "ACME Corp", the button captioned "Confirm". Those labels stay constant across Odoo releases and across customer customisations. The same recording transfers to a different tenant whose underlying DOM looks completely different.

Reproduce with another automation tool

Target https://demo.odoo.com

Goal — Create a customer, draft a one-line invoice for them, post it, and confirm the resulting status is "Posted" with a real invoice number (INV/YYYY/MM/NNNN).

  1. Open the URL. After two redirects you land on a fresh Odoo demo tenant logged in as admin/admin.
  2. Open the app drawer and pick "Contacts".
  3. Click the "New" button.
  4. In the company-name field at the top of the form, type a unique customer name (e.g., "ACME Corp 1234").
  5. In the Email field, type any address (e.g., "billing@acme.example").
  6. Save with Ctrl+S or the cloud icon.
  7. Open the app drawer again, pick "Invoicing", then navigate Customers → Invoices.
  8. Click "New".
  9. In the Customer field, click the input, type the customer name, and pick the matching entry from the autocomplete dropdown that surfaces.
  10. Click "Add a line".
  11. Type "Consulting services" in the description, Tab, then type 1000 in the Price field.
  12. Save, then click the "Confirm" button at the top.
  13. Verify the status bar shows "Posted" and the breadcrumb / record header shows an invoice number matching INV/YYYY/MM/NNNN.

Expected — A trained accountant finishes this in under thirty seconds. A scripted tool gets through Contacts cleanly but stalls at the Customer autocomplete inside the invoice form — the partner dropdown is driven by an asynchronous suggestion fetch that synthetic fill() does not reliably trigger. Selector strings that work today break on the next Odoo release or on a customer-customised tenant.

Counter-example: where pure Playwright works fine

The split is not "rich UIs are unscriptable". It's about whether the rendered content lives in the DOM. The same bar-chart reading test that fails on Chart.js and ECharts passes 10/10 on Highcharts and on a Mermaid flowchart — because both render to SVG with real <text> nodes.

// Highcharts SVG bar chart — same assertion, passes:
await expect(page.getByText('Wednesday').first()).toBeVisible();  // ✓
await expect(page.getByText('60').first()).toBeVisible();         // ✓

Visual / AIVA is the right tool for the canvas / WebGL / bot-protected / media-content / timing-fragile cases above — and the wrong tool when a fast, deterministic, DOM-based assertion already works.

Where this leaves a visual-first / AIVA approach

The eight cases above sort cleanly into four root causes — each of which has a natural counterpart in a visual-driven session running a real headed Chrome through VNC.

Playwright remains the right tool when the contract is DOM-on-DOM. The cases above are where its contract ends.