Playwright reporter that outputs structured JSON for LLM agents. Minimal console output, flat schema, easy to filter to failures.
npm install @clipboard-health/playwright-reporter-llm
// playwright.config.ts
import { defineConfig } from "@playwright/test";
export default defineConfig({
reporter: [
["@clipboard-health/playwright-reporter-llm", { outputFile: "test-results/llm-report.json" }],
],
});
| Option | Default | Description |
|---|---|---|
outputFile |
test-results/llm-report.json |
Path to JSON output |
.......F..........F.S....
26 tests | 23 passed | 2 failed | 1 skipped (4.2s)
Report: test-results/llm-report.json
| Task | Fields |
|---|---|
| Pass/fail overview | summary |
| Triage failures | Filter tests[] by status === "failed", then read errors[0].{message,diff,location,snippet} |
| Diagnose locator detach | tests[].errors[0].{apiName,selector,actionLog} or failed tests[].attempts[].error.{apiName,selector,actionLog} for the failing action's resolved locator logs |
| Identify flakes | Filter tests[] by flaky === true; compare attempts[] statuses |
| Reconstruct failure timeline | Pick an attempt where status !== "passed" (for flakes this is NOT the last attempt), then read .timeline[] (steps + network + console, sorted by offsetMs) |
| Inspect failing requests | tests[].attempts[].network.instances[] — filter by status >= 400, or join groups[groupId].{failureText,wasAborted} |
| Look up request body | tests[].attempts[].network.bodies[instance.requestBodyRef | instance.responseBodyRef] |
| Correlate with backend trace | tests[].attempts[].network.instances[].traceId / .spanId / .requestId / .correlationId |
| Debug uncaught page errors | tests[].attempts[].consoleMessages[] — filter by type in "error" | "pageerror" | "page-crashed" |
| Visual debugging | tests[].attempts[].failureArtifacts.{screenshotBase64,videoPath} |
Walk attempts[].timeline[] (steps + network + console, sorted by offsetMs) to reconstruct the failure. Each timeline network entry carries a networkId you can resolve against attempts[].network.instances[] for full per-request detail and attempts[].network.groups[] / bodies[] for shared shape/payload context:
{
"schemaVersion": 3,
"summary": {
"total": 10,
"passed": 9,
"failed": 1,
"flaky": 0,
"skipped": 0,
"timedOut": 0,
"interrupted": 0
},
"tests": [
{
"id": "abc123",
"title": "Checkout > applies discount code",
"status": "failed",
"flaky": false,
"location": { "file": "tests/checkout.spec.ts", "line": 42, "column": 5 },
"errors": [
{
"message": "Expected: 90\nReceived: 100",
"diff": { "expected": "90", "actual": "100" },
"location": { "file": "tests/checkout.spec.ts", "line": 58, "column": 7 }
}
],
"attempts": [
{
"attempt": 1,
"status": "failed",
"failureArtifacts": { "screenshotBase64": "iVBORw0KGgo...", "videoPath": "video.webm" },
"timeline": [
{
"kind": "step",
"offsetMs": 1050,
"title": "click Apply",
"category": "test.step",
"durationMs": 40,
"depth": 0
},
{
"kind": "network",
"offsetMs": 1100,
"networkId": "n3",
"method": "POST",
"url": "https://api.example.com/discount",
"status": 500
},
{
"kind": "console",
"offsetMs": 1240,
"type": "pageerror",
"text": "Uncaught TypeError: discount is undefined"
},
{
"kind": "step",
"offsetMs": 1300,
"title": "expect total to equal 90",
"category": "pw:api",
"durationMs": 5,
"depth": 0,
"error": "Expected: 90\nReceived: 100"
}
]
}
]
}
]
}
Reads as: click Apply → backend returned 500 → page threw → assertion failed. Resolve networkId: "n3" against attempts[0].network.instances for per-request traceId / spanId / requestId / body refs; groups[instance.groupId] holds the shape and aggregate counts (occurrences, first/last offset).
For a test that fails on attempt 1 and passes on attempt 2, the divergence between the two timelines is the diagnosis. Find the first entry that differs:
{
"tests": [
{
"title": "Checkout > shows confirmation",
"status": "passed",
"flaky": true,
"attempts": [
{
"attempt": 1,
"status": "failed",
"timeline": [
{
"kind": "step",
"offsetMs": 800,
"title": "goto /checkout",
"category": "test.step",
"durationMs": 120,
"depth": 0
},
{
"kind": "network",
"offsetMs": 950,
"networkId": "n0",
"method": "GET",
"url": "https://api.example.com/cart",
"status": 200
},
{
"kind": "step",
"offsetMs": 1400,
"title": "expect banner visible",
"category": "pw:api",
"durationMs": 5000,
"depth": 0,
"error": "Timeout 5000ms exceeded"
}
]
},
{
"attempt": 2,
"status": "passed",
"timeline": [
{
"kind": "step",
"offsetMs": 800,
"title": "goto /checkout",
"category": "test.step",
"durationMs": 120,
"depth": 0
},
{
"kind": "network",
"offsetMs": 950,
"networkId": "n0",
"method": "GET",
"url": "https://api.example.com/cart",
"status": 200
},
{
"kind": "network",
"offsetMs": 1100,
"networkId": "n1",
"method": "GET",
"url": "https://api.example.com/inventory",
"status": 200
},
{
"kind": "step",
"offsetMs": 1350,
"title": "expect banner visible",
"category": "pw:api",
"durationMs": 40,
"depth": 0
}
]
}
]
}
]
}
Divergence: the passing attempt made an extra /inventory call that the failing attempt didn't. That's either a stale frontend cache or a race — not a flaky test, a real bug.
See docs/example-report.json for a complete report with representative optional fields populated (network timings, redirect chains, headers, attachments, step nesting, multi-attempt retries).
summary -- quick pass/fail countstests[].errors[].message -- ANSI-stripped, clean error texttests[].errors[].diff -- extracted expected/actual from assertion errorstests[].errors[].location -- exact file and line of failuretests[].errors[].{apiName,selector,actionLog} / tests[].attempts[].error.{apiName,selector,actionLog} -- failing Playwright action context from trace logs when available, including failed attempts in flaky tests. Useful for locator and detached-DOM failures; action log messages are capped at 512 characters, max 20 entries per error with an omission markertests[].flaky -- true if test passed after retrytests[].attempts[] -- full retry history with per-attempt status, timing, stdio, attachments, steps, and networktests[].attempts[].consoleMessages[] -- warning/error/pageerror/page-closed/page-crashed trace entries only (2KB text cap with [truncated] marker, max 50 per attempt, high-signal entries prioritized over low-signal)tests[].steps / tests[].network / tests[].timeline -- convenience aliases from the final attempttests[].attempts[].timeline[] -- unified, sorted-by-offsetMs array of all retained events (kind: "step" | "network" | "console"). Slimmed-down entries for quick temporal scanning; full details remain in the source arraysoffsetMs -- milliseconds since the attempt's startTime. Always present on steps (from TestStep.startTime). Optional on network entries (from trace _monotonicTime or startedDateTime, converted via the trace's context-options anchor) and console entries (from trace monotonic time field + anchor). Absent when the trace lacks a context-options event. Entries without offsetMs are excluded from the timelinetests[].attempts[].network -- a three-layer NetworkReport that separates instances (what happened), groups (shared shape), and bodies (payloads). Access patterns:
network.instances[] by status, redirectFromId, traceId, etc. to find specific occurrencesnetwork.groups[groupId] for aggregate shape info (resourceType, failureText, wasAborted, occurrenceCount, retainedInstanceCount, suppressedInstanceCount, evictedInstanceCount, firstOffsetMs, lastOffsetMs)network.bodies[instance.requestBodyRef] / [instance.responseBodyRef] for JSON/text payloads (2KB cap with [truncated] marker, canonicalized: false in v3.0)network.summary gives end-to-end accounting: observedInstances === retainedInstances + instancesDroppedByFilter + instancesDroppedByGroupCap + instancesDroppedByInstanceCap + instancesSuppressedAsDuplicate + instancesEvictedAfterAdmission. For every group, occurrenceCount === retainedInstanceCount + suppressedInstanceCount + evictedInstanceCount< — ties reject rather than churn.tests[].attempts[].network.instances[].{traceId,spanId,requestId,correlationId} -- traceId and spanId parsed from the W3C Trace Context traceparent header (preferring response over request); requestId and correlationId from x-request-id / x-correlation-id. Compatible with OpenTelemetry and Datadog (dd-trace ≥ 2.0 emits traceparent by default). Malformed or all-zero traceparent values are rejected.instance.redirectFromId / instance.redirectToId reference sibling instance ids; walk the graph to reconstruct chains.tests[].attempts[].failureArtifacts -- for failing/timed-out/interrupted attempts: screenshotBase64 (base64-encoded screenshot, max 512KB), videoPath (first video attachment path). Omitted entirely when neither screenshot nor video is availabletests[].attachments[].path -- relative to Playwright outputDirtests[].stdout / tests[].stderr -- capped at 4KB with [truncated] markerThis library is specialized for agents:
jq or index into exactly the fields they need.timeline[] of steps, network, and console on every attempt. The divergence between the failing and passing attempts is usually the diagnosis (see the flaky test example).traceId and spanId are parsed from W3C traceparent headers on every network request, so an agent can jump straight from a failing test to the backend trace in Datadog, Jaeger, Tempo, Honeycomb, or any OpenTelemetry-compatible backend.See package.json scripts for a list of commands.