Playwright reporter that outputs structured JSON for LLM agents. Minimal console output, flat schema, easy to filter to failures.
npm install @clipboard-health/playwright-reporter-llm
// playwright.config.ts
import { defineConfig } from "@playwright/test";
export default defineConfig({
reporter: [
["@clipboard-health/playwright-reporter-llm", { outputFile: "test-results/llm-report.json" }],
],
});
| Option | Default | Description |
|---|---|---|
outputFile |
test-results/llm-report.json |
Path to JSON output |
.......F..........F.S....
26 tests | 23 passed | 2 failed | 1 skipped (4.2s)
Report: test-results/llm-report.json
| Task | Fields |
|---|---|
| Pass/fail overview | summary |
| Triage failures | Filter tests[] by status === "failed", then read errors[0].{message,diff,location,snippet} |
| Identify flakes | Filter tests[] by flaky === true; compare attempts[] statuses |
| Reconstruct failure timeline | Pick an attempt where status !== "passed" (for flakes this is NOT the last attempt), then read .timeline[] (steps + network + console, sorted by offsetMs) |
| Inspect failing requests | tests[].attempts[].network[] — filter by status >= 400, failureText, or wasAborted |
| Correlate with backend trace | tests[].attempts[].network[].traceId and .spanId (parsed from W3C traceparent — works with OpenTelemetry, Datadog, Jaeger, Tempo, Honeycomb, etc.) |
| Debug uncaught page errors | tests[].attempts[].consoleMessages[] — filter by type in "error" | "pageerror" | "page-crashed" |
| Visual debugging | tests[].attempts[].failureArtifacts.{screenshotBase64,videoPath} |
Walk attempts[].timeline[] (steps + network + console, sorted by offsetMs) to reconstruct the failure:
{
"schemaVersion": 2,
"summary": {
"total": 10,
"passed": 9,
"failed": 1,
"flaky": 0,
"skipped": 0,
"timedOut": 0,
"interrupted": 0
},
"tests": [
{
"id": "abc123",
"title": "Checkout > applies discount code",
"status": "failed",
"flaky": false,
"location": { "file": "tests/checkout.spec.ts", "line": 42, "column": 5 },
"errors": [
{
"message": "Expected: 90\nReceived: 100",
"diff": { "expected": "90", "actual": "100" },
"location": { "file": "tests/checkout.spec.ts", "line": 58, "column": 7 }
}
],
"attempts": [
{
"attempt": 1,
"status": "failed",
"failureArtifacts": { "screenshotBase64": "iVBORw0KGgo...", "videoPath": "video.webm" },
"timeline": [
{
"kind": "step",
"offsetMs": 1050,
"title": "click Apply",
"category": "test.step",
"durationMs": 40,
"depth": 0
},
{
"kind": "network",
"offsetMs": 1100,
"method": "POST",
"url": "https://api.example.com/discount",
"status": 500,
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"spanId": "00f067aa0ba902b7"
},
{
"kind": "console",
"offsetMs": 1240,
"type": "pageerror",
"text": "Uncaught TypeError: discount is undefined"
},
{
"kind": "step",
"offsetMs": 1300,
"title": "expect total to equal 90",
"category": "pw:api",
"durationMs": 5,
"depth": 0,
"error": "Expected: 90\nReceived: 100"
}
]
}
]
}
]
}
Reads as: click Apply → backend returned 500 → page threw → assertion failed. traceId lets you correlate the 500 with your tracing backend (Datadog, Jaeger, Tempo, Honeycomb, etc.).
For a test that fails on attempt 1 and passes on attempt 2, the divergence between the two timelines is the diagnosis. Find the first entry that differs:
{
"tests": [
{
"title": "Checkout > shows confirmation",
"status": "passed",
"flaky": true,
"attempts": [
{
"attempt": 1,
"status": "failed",
"timeline": [
{
"kind": "step",
"offsetMs": 800,
"title": "goto /checkout",
"category": "test.step",
"durationMs": 120,
"depth": 0
},
{
"kind": "network",
"offsetMs": 950,
"method": "GET",
"url": "https://api.example.com/cart",
"status": 200
},
{
"kind": "step",
"offsetMs": 1400,
"title": "expect banner visible",
"category": "pw:api",
"durationMs": 5000,
"depth": 0,
"error": "Timeout 5000ms exceeded"
}
]
},
{
"attempt": 2,
"status": "passed",
"timeline": [
{
"kind": "step",
"offsetMs": 800,
"title": "goto /checkout",
"category": "test.step",
"durationMs": 120,
"depth": 0
},
{
"kind": "network",
"offsetMs": 950,
"method": "GET",
"url": "https://api.example.com/cart",
"status": 200
},
{
"kind": "network",
"offsetMs": 1100,
"method": "GET",
"url": "https://api.example.com/inventory",
"status": 200
},
{
"kind": "step",
"offsetMs": 1350,
"title": "expect banner visible",
"category": "pw:api",
"durationMs": 40,
"depth": 0
}
]
}
]
}
]
}
Divergence: the passing attempt made an extra /inventory call that the failing attempt didn't. That's either a stale frontend cache or a race — not a flaky test, a real bug.
See docs/example-report.json for a complete report with representative optional fields populated (network timings, redirect chains, headers, attachments, step nesting, multi-attempt retries).
summary -- quick pass/fail countstests[].errors[].message -- ANSI-stripped, clean error texttests[].errors[].diff -- extracted expected/actual from assertion errorstests[].errors[].location -- exact file and line of failuretests[].flaky -- true if test passed after retrytests[].attempts[] -- full retry history with per-attempt status, timing, stdio, attachments, steps, and networktests[].attempts[].consoleMessages[] -- warning/error/pageerror/page-closed/page-crashed trace entries only (2KB text cap with [truncated] marker, max 50 per attempt, high-signal entries prioritized over low-signal)tests[].steps / tests[].network / tests[].timeline -- convenience aliases from the final attempttests[].attempts[].timeline[] -- unified, sorted-by-offsetMs array of all retained events (kind: "step" | "network" | "console"). Slimmed-down entries for quick temporal scanning; full details remain in the source arraysoffsetMs -- milliseconds since the attempt's startTime. Always present on steps (from TestStep.startTime). Optional on network entries (from trace _monotonicTime or startedDateTime, converted via the trace's context-options anchor) and console entries (from trace monotonic time field + anchor). Absent when the trace lacks a context-options event. Entries without offsetMs are excluded from the timelinetests[].attempts[].network[].traceId / spanId -- parsed from the W3C Trace Context traceparent header (preferring response over request). traceId is 32 hex chars (128-bit), spanId is 16 hex chars (64-bit). Compatible with OpenTelemetry and Datadog (dd-trace ≥ 2.0 emits traceparent by default). Malformed or all-zero values are rejected.tests[].attempts[].network[] -- max 200 per attempt, priority-based: fetch/xhr requests, error responses (status >= 400), failed, and aborted requests are retained over static assets (script, stylesheet, image, font, media). Includes failure details (failureText, wasAborted), redirect chain (redirectToUrl, redirectFromUrl, redirectChain), timing breakdown (timings), durationMs derived from available timing components, allowlisted headers (requestHeaders, responseHeaders), and JSON/text requestBody / responseBody pulled from the trace archive (capped at 2KB with [truncated] marker)tests[].attempts[].network[].requestHeaders / responseHeaders -- allowlisted only: content-type, location (response), x-request-id, x-correlation-id, traceparent, tracestate. Values capped at 256 chars.tests[].attempts[].failureArtifacts -- for failing/timed-out/interrupted attempts: screenshotBase64 (base64-encoded screenshot, max 512KB), videoPath (first video attachment path). Omitted entirely when neither screenshot nor video is availabletests[].attachments[].path -- relative to Playwright outputDirtests[].stdout / tests[].stderr -- capped at 4KB with [truncated] markerSee package.json scripts for a list of commands.