CMP Scan API — Local Scanner Service
Last updated: 2025-09-14
The Scan API runs a Playwright-based two-phase scan (pre/post consent) and returns a single JSON report with security analysis and third‑party diffs. It is intended for local use by the Scanner Tool and CI.
Endpoints
-
GET
/health- Returns:
{ ok: true, name: 'cmp-scan-api', version: 1, uptimeMs }
- Returns:
-
GET
/health/ready- Attempts a Chromium launch/close to verify readiness.
- Returns:
{ ok: true|false, name, version, details?, uptimeMs }
-
POST
/scan- Body:
urlstring (required)siteKeystring (optional) — forwarded to classificationgpcboolean (optional) — simulates Global Privacy Controlregistrystring (optional) — CMP Registry base, e.g.http://localhost:3318/apicaptureScreenshotboolean (optional, default false) — capture a full‑page PNGcaptureTraceboolean (optional, default false) — capture a Playwright trace.zip
- Returns:
ScanReport(see below)
- Body:
Behavior
-
Navigation
- Uses a stable Chrome-like UA and
Accept-Language: en-GB,en;q=0.9. - When
gpc=true, setsSec-GPC: 1header and definesnavigator.globalPrivacyControl = true. - Tracks redirect chain and records the first main-document response headers, even when redirected (e.g., to
consent.google.com). - Exposes
finalUrlin the report.
- Uses a stable Chrome-like UA and
-
Third‑party detection
- PSL-aware, schemeful site comparison via
tldts(eTLD+1). The top-level site is derived fromfinalUrl.
- PSL-aware, schemeful site comparison via
-
Cookie analysis
- Parses
Set-Cookieflags and reports issues:SameSite=NonewithoutSecure, non‑Secureon HTTPS, third‑party withoutSameSite=None, and__Secure-/__Host-prefix requirements. - Any cookie observed only via
document.cookieis considered client‑side and thus notHttpOnly.
- Parses
Response shape (excerpt)
{
"scannedAt": "2025-09-14T16:06:55.643Z",
"url": "https://www.google.com",
"finalUrl": "https://consent.google.com/...", // may equal input if no redirect
"redirectChain": [
{ "from": "https://www.google.com", "to": "https://consent.google.com/...", "status": 302, "location": "..." }
],
"pre": {
"jsCookies": "...",
"setCookies": [...], // parsed from Set-Cookie response headers
"storageCookies": [ // from Playwright context.cookies(); never includes values
{ "name":"_ga","domain":".example.com","secure":true,"httpOnly":false,"sameSite":"Lax","expires":1735689600,"firstParty":true }
],
"classifiedCookies": [ // result of POST /api/v1/classify-cookies (DB-backed)
{ "name":"_ga","domain":".example.com","vendor":"Google Analytics","category":"analytics","retention":"~2 years","flags":{"secure":true,"httpOnly":false,"sameSite":"Lax"},"firstParty":true,"confidence":90,"evidence":["pattern:_ga"] }
],
"thirdParty": ["..."]
},
"post": { /* same fields as pre */ },
"status": "ok",
"durationMs": 1534,
"documentHeaders": { "strict-transport-security": "max-age=...", "referrer-policy": "..." },
"artifacts": { "screenshotUrl": "/artifacts/scan/1694690000-abcd12/screenshot.png", "traceUrl": "/artifacts/scan/1694690000-abcd12/trace.zip" },
"diff": {
"newThirdPartyHosts": ["..."],
"newThirdPartyClassified": { "...": { "category": "analytics", "reasons": ["dataset"] } },
"newCookieNames": ["_ga"],
"preConsentViolations": [ // non‑essential cookies detected pre-consent
{ "name":"_fbp","domain":".example.com","category":"advertising","vendor":"Meta" }
]
},
"summary": {
"newThirdPartyCount": 0,
"newThirdPartyByCategory": {},
"cookies": { "preSetCount": 0, "postSetCount": 0, "newNameCount": 0 }
},
"analysis": {
"headers": { "present": ["HSTS"], "missing": ["CSP", "Referrer-Policy", "X-Content-Type-Options"], "issues": ["Missing Content-Security-Policy"] },
"cookies": { "issues": ["Cookie _ga: created client‑side (not HttpOnly)"] }
}
}
Run locally
pnpm -w nx run cmp-scan-api:build
PORT=3006 node dist/apps/cmp/scan-api/src/main.js
# Health
curl -sS http://localhost:3006/health | jq
# Scan (baseline)
curl -sS -X POST http://localhost:3006/scan \
-H 'content-type: application/json' \
-d '{"url":"https://www.google.com","siteKey":"DEV_SITE_KEY","gpc":false,"registry":"http://localhost:3318/api","captureScreenshot":true}' \
| jq '{finalUrl, redirectChain, status, durationMs, artifacts, preCookies: .pre.classifiedCookies[0:3], violations: .diff.preConsentViolations}'
# Scan (GPC=true)
curl -sS -X POST http://localhost:3006/scan \
-H 'content-type: application/json' \
-d '{"url":"https://www.google.com","siteKey":"DEV_SITE_KEY","gpc":true,"registry":"http://localhost:3318/api","captureTrace":true}' \
| jq '{finalUrl, redirectChain, status, durationMs, artifacts, preCookies: .pre.classifiedCookies[0:3], violations: .diff.preConsentViolations}'
Notes
- The dev server auto-selects a free port among 3005–3007 if
PORTis not set. The Scanner Tool tries these ports by default. - CORS is permissive for local docs use; do not expose this service publicly without rate limits/ACLs.
- Cookie values are never captured; only names/flags/retention and derived classification are emitted.
Security header quick checks
Quickly inspect common security headers for any public URL:
URL=https://example.com
curl -sSI "$URL" | egrep -i "strict-transport|content-security|referrer-policy|x-content-type|x-frame|permissions-policy"
Troubleshooting
-
No
finalUrlor empty headers- Ensure the page reached a main-document response; the scanner records the first main-frame response headers seen. For highly dynamic sites, try re-running or using a different locale.
-
No third‑party hosts on Google domains under GPC
- Under
Sec-GPC: 1, Google may serve a consent interstitial;finalUrl/redirectChainwill reflect that. This is expected.
- Under
See also
- Scanner Tool UI: accessible under
/scanner-toolin the docs site. It calls this API directly and displaysfinalUrland the redirect chain. - Registry classification API:
GET /v1/classify?host=…(docs/cmp/registry.md)
Deployment (Kubernetes)
- ArgoCD application:
kubernetes/cmp/scan-api/cmp-scan-api-argo.yaml- Tracks Helm chart at
charts/cmp-scan-apion branchmain - Automated prune/self-heal enabled
- Tracks Helm chart at
- Helm chart:
charts/cmp-scan-api- values.yaml exposes
image,service,ingress, autoscaling, resources - Production values:
charts/cmp-scan-api/values-prod.yaml - Ingress (UAT) is enabled with host
cmp-scan-api.uat.digiwedge.comand TLS secretcmp-scan-api-uat-tls
- values.yaml exposes
Registry integration
Set the Registry env var CMP_SCAN_API_BASE to the public base of this service so that the Portal’s health check and scan runner can reach it:
# Example (production)
CMP_SCAN_API_BASE=https://cmp-scan-api.uat.digiwedge.com
The Registry public aggregator GET /api/health/scan-api uses this base to check /health and /health/ready on the Scan API.