CMP Scanner — Pre/Post Consent Diff
Last updated: 2025-09-14 (Consolidated under docs/cmp)
cmp-scanner is a lightweight Playwright-based tool that scans a URL in two phases (pre‑consent and post‑consent), then reports third‑party host and cookie diffs. It optionally classifies new hosts via the CMP Registry. A local Scan API is also available and is used by the Scanner Tool UI.
Install Playwright browser
npx playwright install chromium
Usage
# Basic
pnpm --filter cmp-scanner scan https://example.com tmp/report.json
# With classification via registry (point to API base `/api`)
CMP_REGISTRY_URL=http://localhost:3318 pnpm --filter cmp-scanner scan https://example.com tmp/report.json
# Pass site_key (for site-aware overrides)
pnpm --filter cmp-scanner scan https://example.com tmp/report.json --site-key=DEV_SITE_KEY
# Simulate Global Privacy Control (GPC)
pnpm --filter cmp-scanner scan https://example.com tmp/report.json --gpc=1
CMP_REGISTRY_URL should point to your running Registry API base (e.g., http://localhost:3318). When set, the scanner calls GET /v1/classify?host=…&site_key=… for newly observed hosts after consent.
You can pass the site key via --site-key=… or SCAN_SITE_KEY.
What it does
- Phase 1 — Pre‑consent
- Loads the page and waits for network idle
- Records
document.cookie, responseSet-Cookieflags, and all requested hostnames
- Grants consent
- Tries
window.DWConsent.setAll(true) - Fallback: heuristically clicks a visible button that looks like “Accept all”
- Phase 2 — Post‑consent
- Waits briefly, then records the same data again
- Diff and classification
- Computes newly observed third‑party hosts and cookie names
- Classifies new hosts via the Registry (or basic regex fallback when
CMP_REGISTRY_URLis unset)
Report shape
{
"scannedAt": "2025-09-12T12:34:56.789Z",
"url": "https://example.com",
"finalUrl": "https://consent.google.com/...",
"redirectChain": [{ "from": "…", "to": "…", "status": 302 }],
"pre": {
"jsCookies": "...",
"setCookies": [
{ "url": "...", "name": "...", "secure": true, "httponly": false, "samesite": "Lax" },
],
"thirdParty": ["www.googletagmanager.com", "..."],
},
"post": {
/* same fields */
},
"diff": {
"newThirdPartyHosts": ["www.google-analytics.com"],
"newThirdPartyClassified": {
"www.google-analytics.com": {
"host": "…",
"category": "analytics",
"reasons": ["dataset", "override?"],
},
},
"newCookieNames": ["_ga", "_gid"],
},
}
See docs/cmp/scan-api.md for the HTTP API variant and curl examples. The UI at /scanner-tool displays the finalUrl and any redirect hops so you can see consent interstitials (e.g., Google under GPC).
Interpreting Results
- GPC redirects to consent pages
- When
gpc=true, some sites (e.g., Google) may redirect toconsent.google.com. This is expected; checkfinalUrland theredirectChainto confirm.
- When
- Client‑side cookies appear as not HttpOnly
- Cookies only observed via
document.cookieare client‑set and therefore notHttpOnly. This is highlighted as an issue for awareness.
- Cookies only observed via
- No third‑party hosts after consent
- If the banner blocks third‑party loading until explicit accept, you may see zero new third‑party hosts post‑consent. Verify user flow and timing; some sites load after additional navigation or interaction.
Tips
- Prefer real user flows (banner visible) for accurate results.
- Use the Portal’s “Copy verify cmds” to quickly run header/cookie checks and the scanner with classification enabled.
Manual Gate (hosted workflows disabled)
Run the scanner locally against your canary URL and compare the diffs:
pnpm --filter cmp-scanner scan https://example.com tmp/report.json
pnpm --filter cmp-scanner scan https://example.com tmp/report-gpc.json --gpc=1
Compare diff.newThirdPartyHosts and diff.newCookieNames between runs and
adjust CI_SCANNER_ALLOW_NEW_HOSTS / CI_SCANNER_ALLOW_NEW_COOKIES only when
explicitly approved.