CMP Registry API — Dev Guide
Last updated: 2026-01-09 (Consolidated under docs/cmp)
The registry serves multi‑tenant CMP configurations, provides host classification, and ingests consent events. Admin endpoints are protected with OIDC (JWKS) and optionally accept HS256‑signed JWTs from our IDP when configured.
Note: All API routes are served under the base path /api.
Endpoints
-
GET
/v1/config?site_key=...&v=live- Returns:
{ version, ui, categories, vendors, geoRules, region } regionis detected from headers (e.g.,CF-IPCountry) and may be used by clients for UI defaults.- Caching:
ETagweak validator +Cache-Control: public, max-age=300, stale-while-revalidate=60.
- Returns:
-
POST
/v1/consent- Body:
{ siteKey, categories: {analytics?,marketing?,functional?}, version, region?, source } - Appends an event; no PII is stored.
- Respects Global Privacy Control (GPC): if the request includes header
Sec-GPC: 1, the event is tagged withgpc: trueand the metriccmp_gpc_detected_total{route="consent"}increments.
- Body:
-
GET
/v1/classify?host=example.com- Returns a canonical category for a third‑party host, e.g.,
{ host, category: 'analytics'|'advertising'|'functional'|'social'|'uncategorized', reasons: string[] }. - Combines dataset evidence with any manual overrides.
- See “Datasets & Sync” for how the classifier database is populated and how to run the sync jobs.
- Guide: docs/cmp/datasets.md
- Returns a canonical category for a third‑party host, e.g.,
-
Admin (requires Bearer token with
cmp.admin):- POST
/admin/sites/by-key/:siteKey/configs→ create version (Body.publish=true to publish) - POST
/admin/sites/by-key/:siteKey/publish→{ version }marks an existing version as live - POST
/admin/sites/by-key/:siteKey/rotate-key→ rotates site key - GET
/admin/sites?limit=50&offset=0→ lists sites with their current live version (paginated) - GET
/admin/sites/by-key/:siteKey/domains→{ items: [{host,id?,legacy?}] }list site domains (404 if site not found) - POST
/admin/sites/by-key/:siteKey/domains→ add/upsert a domain for a site (201 on success; 400 on bad host; 404 if site not found) - DELETE
/admin/sites/by-key/:siteKey/domains→ remove a domain (400 on bad host or trying to remove legacy primary; 404 if site not found) - POST
/admin/sites/by-key/:siteKey/domains/copy-primary→ copy legacy primary domain into domains list (404 if site not found; 400 if none set) - POST
/admin/classifier/override→{ host, category, reason? }upserts a manual classifier override - DELETE
/admin/classifier/override?host=...&site_key?=...→ deletes an override (400 on missing host; 404 if site_key provided but unknown) - POST
/admin/classifier/overrides:list→ list overrides with filters{ query?, limit?, offset?, site_key?, scope? } - POST
/admin/classifier/overrides:export→ CSV export using same filters - POST
/admin/classifier/overrides:import→ CSV import, requires headershost,category(400 if empty or missing headers) - GET
/admin/analytics/sites/by-key/:siteKey/consents?range=7d|30d&from=&to=→ JSON summary{ total, acceptAll, rejectAll, partial, gpc, start, end } - GET
/admin/analytics/sites/by-key/:siteKey/consents/export?format=csv|json|jsonl&range=Nd|Nh&from=ISO&to=ISO®ion=US&gpc=0|1→ CSV, JSON, or JSONL export- Query validation:
formatincsv|json|jsonl;rangematches^\d+(d|h)$;from/toare ISO 8601;gpcin0|1. - 404 if
siteKeyis unknown; 400 for invalid queries.
- Query validation:
- POST
-
Scans (admin):
- POST
/admin/scans— ingest a completed scan run (from a trusted scanner) - GET
/admin/scans— list persisted runs (filters: siteKey, topDomain, url, gpc, from/to, pagination) - GET
/admin/scans/:id— get a single run - POST
/admin/scans/:id/baseline— promote this run as baseline for its site/topDomain/GPC - GET
/admin/scans/baseline?siteKey=...&topDomain=...&gpc=true|false— get baseline (includes scanRun) - GET
/admin/scans/summary/daily— daily rollups (runs, new_third_parties, new_cookie_names, ok_runs) - POST
/admin/scans/run— trigger a scan via an external Scan API ({ url, siteKey, gpc?, scanApiBase, registryBase? }) and persist
- POST
-
Public (no auth):
- GET
/v1/scans?url=…&topDomain=…&take=…— list sanitized scans - GET
/v1/scans/:id— sanitized scan detail - POST
/v1/scans— ingest a sanitized result (no cookie values or response bodies) - Health: GET
/health/scan-api— checks the configured Scan API base (seeCMP_SCAN_API_BASE)
- GET
Environment
PORT(default 3318 for local dev; container image default is 3410)CMP_REGISTRY_DATABASE_URL(PostgreSQL)- Datasets use AdGuard‑maintained WhoTracks.me exports by default; you can override URLs if needed. See docs/cmp/datasets.md
- OIDC:
OIDC_ISSUER(required)OIDC_AUDIENCE(required)OIDC_JWKS_URI(optional; overrides discovery)OIDC_HS_SECRET(optional) — when present, tokens are first verified with HS256 using this shared secret before falling back to JWKS (RS256/RS384/RS512, PS256/PS384/PS512, ES256/ES384/ES512). Use the same value as IDPJWT_SECRETfor HMAC tokens.- Rotation plan: update the IDP and registry in lockstep, deploy the new secret, keep the previous secret active in the IDP until all producers rotate, then remove the old secret from both sides.
- Accepted asymmetric JWT algorithms: RS256/RS384/RS512, PS256/PS384/PS512, ES256/ES384/ES512. Align with your IdP signing config.
- Append auth:
APPEND_AUTH_REQUIRED=1— requires Bearer tokens forPOST /consent/v1/appendand verifies them against the OIDC settings above (rejects missing/invalid/expired tokens with 401).- Startup validation fails fast when
APPEND_AUTH_REQUIRED=1butOIDC_ISSUERorOIDC_AUDIENCEare missing.
CORS & rate limiting (security)
/v1/consentenforces CORS by site: the Origin must match the Site domain configured for the providedsiteKey./admin/*CORS is limited to the Portal origin(s) (configure viaCORS_ADMIN_ALLOWED_ORIGINS)./v1/configsetsAccess-Control-Allow-Origindynamically for the requesting site; strict enforcement can be enabled./v1/scansis CORS-enabled for GET/POST; apply rate limits and/or CAPTCHA in production.- Rate limiting is applied to
/v1/configand/v1/consent. Env (wired via Helm):RATE_LIMIT_WINDOW_MS,RATE_LIMIT_MAX,RATE_LIMIT_SKIP_SUCCESS,RATE_LIMIT_STD_HDRS,RATE_LIMIT_LEGACY_HDRS,RATE_LIMIT_TRUST_PROXYCORS_ADMIN_ALLOWED_ORIGINS,CORS_STRICT_CONFIG
Auth semantics
- Admin routes require a Bearer token that conveys
cmp.adminvia any of:scope(space‑delimited),roles(CMP_ADMINorcmp.admin), orpermissions. - Consent append (
POST /consent/v1/append) requires a valid Bearer token whenAPPEND_AUTH_REQUIRED=1. Verification enforces issuer/audience, signature (HS256 or JWKS), and a non‑expiredexpclaim. - Append auth failures are tracked via
cmp_append_auth_failures_total{reason=...}(invalid_token, jwks_fetch_failed, missing_token, missing_config, missing_exp). - Dev helpers (non‑production only):
DISABLE_ADMIN_AUTH=1orDEV_BYPASS_ADMIN_AUTH=1— skip verification entirely for local smoke testing.DEV_ADMIN_BEARER=<static>— accept a specific static bearer for local testing.
- Future: if additional services adopt M2M JWT validation, extract a shared
@digiwedge/m2m-authhelper to centralize guard logic and env conventions.
Endpoints Expected by the Portal
This section documents the minimal endpoints and shapes the CMP Portal binds to. Adjust routes if your Registry uses different paths — the Portal code is isolated so you can update endpoints in one place.
Sites list (admin)
- Route:
GET /admin/sites - Query params:
q(optional) – search term for remote Site Selectpage(1‑based) andpageSize– for debounced search results- Legacy pagination (supported for simple list):
limit,offset
- Recommended response (paginated):
{
"items": [
{ "key": "DEV_SITE_KEY", "name": "Development", "siteKey": "DEV_SITE_KEY" },
{ "key": "HOWTH-YC", "name": "Howth Yacht Club" }
],
"total": 2
}
- Legacy response (still handled by the Portal in some fetchers):
[
{ "siteKey": "DEV_SITE_KEY" },
{ "siteKey": "HOWTH-YC" }
]
Config list + upsert (admin)
- List:
GET /admin/config- Query:
page,pageSize,q, optionalsiteKey - Response:
- Query:
{
"items": [
{ "key": "consent.banner.enabled", "value": "true", "siteKey": "DEV_SITE_KEY", "updatedAt": "2025-09-15T12:34:56.000Z" },
{ "key": "consent.defaultLocale", "value": "en", "updatedAt": "2025-09-15T11:11:11.000Z" }
],
"total": 2
}
- Create:
POST /admin/config
Body: { "key": "string", "value": "string", "siteKey?": "string" }
200: { "key": "string", "value": "string", "siteKey?": "string", "updatedAt": "ISO" }
- Update:
PUT /admin/config
Body: { "key": "string", "value": "string", "siteKey?": "string" }
200: { "key": "string", "value": "string", "siteKey?": "string", "updatedAt": "ISO" }
- Delete:
DELETE /admin/config?key=...&siteKey?=...
200: { "ok": true }
Notes
- The Portal treats
(siteKey,key)as a composite identity when present; in edit mode it does not change the key. - If your Registry uses resourceful routing (e.g.,
PUT /admin/config/:key), adapt the portal API helper paths accordingly.
Analytics list (admin)
- Route:
GET /admin/analytics - Query:
page,pageSize, optionalq, optionalsiteKey - Response:
{
"items": [
{
"id": "20250915-DEV_SITE_KEY-consents.accepted",
"siteKey": "DEV_SITE_KEY",
"metric": "consents.accepted",
"value": 1234,
"windowStart": "2025-09-15T00:00:00.000Z",
"windowEnd": "2025-09-15T23:59:59.999Z"
}
],
"total": 1
}
Where metric is a stable string key for the aggregated metric (e.g., consents.accepted, consents.rejected, consents.partial).
Validation & error semantics
- Inputs are validated via NestJS ValidationPipe (whitelist + transform). Invalid requests return
400 Bad Requestwith a standard error body. - Admin domain and classifier endpoints return
404 Not Foundfor unknown site keys when explicitly provided, and400 Bad Requestfor missing/invalid fields. - Admin sites domains listing returns
404 Not Foundwhen the site key does not exist.
Prisma setup
pnpm nx run cmp-registry-data:prisma:generate
pnpm nx run cmp-registry-data:prisma:db-push
Database (Kubernetes)
CMP follows the same external pattern as billing: a Service LoadBalancer with a dedicated port.
- Service:
cmp-database(type: LoadBalancer) - External port:
5440→ targetPort5432 - DSN example (external):
postgresql://cmp_user:<PASSWORD>@cmp-db.uat.digiwedge.com:5440/cmp?schema=public
- Infisical variable used by Prisma and services:
CMP_REGISTRY_DATABASE_URL=postgresql://cmp_user:<PASSWORD>@cmp-db.uat.digiwedge.com:5440/cmp?schema=public
Notes
- On bare‑metal, ensure a LoadBalancer controller (for example, a cloud‑provider LB or MetalLB) issues a public EXTERNAL‑IP. Update the DNS A record for
cmp-db.uat.digiwedge.comto that IP (or annotate the Service withexternal-dns.alpha.kubernetes.io/hostname: cmp-db.uat.digiwedge.com). - You can restrict external access to specific IPv4 ranges at the Service level:
spec:
type: LoadBalancer
ports:
- name: postgresql
port: 5440
targetPort: 5432
loadBalancerSourceRanges:
- '41.203.10.101/32' # example: allow only this IPv4
Troubleshooting
P1000 Authentication failed: reset the database user to match the Secret value.
PW=$(kubectl -n cmp get secret cmp-database-secret -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d)
kubectl -n cmp exec sts/cmp-database -- \
sh -lc "PGPASSWORD='$PW' psql -U cmp_user -d cmp -h 127.0.0.1 -c \"ALTER USER cmp_user WITH PASSWORD '$PW';\""
P1001 Can't reach database server:- Ensure the Service shows a public
EXTERNAL-IP - Confirm firewall/ACLs are not blocking your source IP (see
loadBalancerSourceRanges). - As a fallback, use a port‑forward for admin tasks:
- Ensure the Service shows a public
kubectl -n cmp port-forward svc/cmp-database 15440:5432
export CMP_REGISTRY_DATABASE_URL="postgresql://cmp_user:<PASSWORD>@127.0.0.1:15440/cmp?schema=public"
pnpm nx run cmp-registry-data:prisma:db-push
Seed (DEV_SITE_KEY, DEV_SITE_KEY_2)
pnpm run cmp:seed
Seed cookie definitions
Seeds CookieDefinition from apps/cmp/registry/seed/cookies.seed.json.
# Ensure schema is pushed (once per environment)
infisical run -- pnpm -w nx run cmp-registry-data:prisma:db-push
# Seed cookie definitions (requires CMP_REGISTRY_DATABASE_URL)
infisical run -- pnpm -w nx run cmp-registry:seed:cookies
# Optional: verify count via Prisma
infisical run -- node -e "import('@prisma/cmp-registry').then(async (m)=>{const p=new m.PrismaClient();console.log('CookieDefinition count =', await p.cookieDefinition.count());await p.$disconnect();})"
Export Worker
The export worker processes queued ExportJob rows and creates ExportArtifact records that can be downloaded later.
- Start locally:
pnpm -w nx run cmp-registry:jobs:exports-worker
-
Environment variables:
EXPORT_WORKER_IDLE_MS(default 2000)EXPORT_WORKER_BATCH_LIMIT(default 1000000)
-
Kubernetes (same image as cmp-registry, different command):
apiVersion: apps/v1
kind: Deployment
metadata:
name: cmp-export-worker
namespace: cmp
spec:
replicas: 1
selector:
matchLabels: { app: cmp-export-worker }
template:
metadata:
labels: { app: cmp-export-worker }
spec:
containers:
- name: worker
image: registry.digiwedge.com/digiwedge/cmp-registry:latest
command: ['node', 'dist/apps/cmp/registry/src/jobs/exports-worker.js']
env:
- name: CMP_REGISTRY_DATABASE_URL
valueFrom: { secretKeyRef: { name: cmp-database-secret, key: DATABASE_URL } }
- name: EXPORT_WORKER_IDLE_MS
value: '2000'
- name: EXPORT_WORKER_BATCH_LIMIT
value: '1000000'
- Portal usage:
- POST
/api/admin/exports/consents:startstarts a job and returnsjobId. - Poll GET
/api/admin/exports/jobs/:iduntilstatus=done→artifactId. - Download via GET
/api/admin/exports/:artifactIdor list with GET/api/admin/exports?siteKey=....
- POST
Schema changes
- Added
ScanBaselineto store baseline persiteKey+topDomain+gpc. - Added
AdminAuditLogto record admin actions (actor, action, siteKey, details), with indexes by tenant/site and action. - After pulling changes, run:
infisical run -- pnpm -w nx run cmp-registry-data:prisma:db-push
Local run
pnpm nx serve cmp-registry
### Environment variables
In addition to DB/OIDC/CORS/rate-limit configuration above:
- `CMP_SCAN_API_BASE` (optional)
- Base URL of the Scan API used by the public health aggregator at `GET /api/health/scan-api`.
- Example (local dev): `http://localhost:3006`
- Example (UAT): `https://cmp-scan-api.uat.digiwedge.com`
- If unset, `/api/health/scan-api` returns `{ ok: false, error: 'CMP_SCAN_API_BASE not set' }`.
# API base is /api
# GET http://localhost:3318/v1/config?site_key=DEV_SITE_KEY&v=live
Tenant Scoping
- Sites belong to a
Tenant(Site.tenantId). WhenCMP_ENFORCE_TENANT=1, admin APIs that operate on a site key enforce that the caller belongs to the same tenant. - Tenant is resolved from the admin token claim (
tenant_id|tenantId|org_id|orgId|tid). For local testing, a headerx-tenant-idmay be supplied. - Affected endpoints (non-exhaustive):
GET /api/admin/analytics/sites/by-key/:siteKey/*POST /api/admin/cookies/overrides*(site-scoped overrides)POST /api/admin/scans/:id/baselineandGET /api/admin/scans/baselinePOST /api/admin/exports/consents:*,GET /api/admin/exports(siteKey),GET /api/admin/exports/:id(artifact siteKey check)
Admin Audit Logs
- New table
AdminAuditLogtracks admin actions with time, tenantId, siteKey, action key, actor identity (iss/sub/name), IP, and JSON details. - Logged events include:
cookies.definitions.batch,cookies.overrides.upsert|delete|import|exportscans.baseline.promoteexports.consents.start|jobsites.domains.add|remove|copy-primary
- Retention is controlled by
AUDIT_RETENTION_DAYS(default 180) processed by the retention job.
Endpoints
-
List logs (JSON)
- GET
/api/admin/audit/logs?siteKey=&action=&range=7d|30d|24h&from=&to=&limit=50&offset=0 - Returns:
{ items: [{ id, ts, tenantId, siteKey, action, actorSub, actorIss, actorName, ip, details }], total } - Tenant scoping: when
CMP_ENFORCE_TENANT=1, results are filtered by the caller’s tenant (headerx-tenant-idor token claim).
- GET
-
Export logs (CSV)
- GET
/api/admin/audit/logs/export?siteKey=&action=&range=...&from=&to= text/csvwith header:ts,tenantId,siteKey,action,actorSub,actorIss,actorName,ip,details- Notes:
detailsis JSON-serialized; values are quoted and escaped.
- GET
Rate Limiting
- Public endpoints already include limits for
/v1/config,/v1/consent,/v1/scans/run. - Admin limits added:
/api/admin/analytics/*— windowADMIN_ANALYTICS_WINDOW_MS(default 60s), maxADMIN_ANALYTICS_MAX(default 120)/api/admin/exports/*— windowADMIN_EXPORTS_WINDOW_MS(default 60s), maxADMIN_EXPORTS_MAX(default 20)
- 429s are counted in
cmp_http_rate_limited_total{route}.
Curl examples
REG=http://localhost:3318/api
ADMIN=$REG/admin
# Live config with region and caching headers
curl -sSI "$REG/v1/config?site_key=DEV_SITE_KEY&v=live"
# Append consent event
curl -sS -X POST "$REG/v1/consent" \
-H 'content-type: application/json' \
-d '{"siteKey":"DEV_SITE_KEY","categories":{"analytics":true},"version":1,"source":"banner"}'
# Classify a host
curl -sS "$REG/v1/classify?host=googletagmanager.com"
# Classify cookies (batch)
curl -sS -H 'content-type: application/json' \
-X POST "$REG/v1/classify-cookies?site_key=DEV_SITE_KEY" \
-d '[{"name":"_ga","domain":".example.com","host":"www.googletagmanager.com","secure":true,"httpOnly":false,"sameSite":"Lax","maxAgeSec":63072000}]' | jq
# Run datasets now (Infisical) and re‑test classification
infisical run --env=dev -- pnpm -w nx run cmp-registry:jobs:datasets --tui=false --skip-nx-cache
curl -sS "$REG/v1/classify?host=www.google-analytics.com"
# Upsert a manual classifier override (requires Bearer token with cmp.admin)
curl -sS -X POST "$ADMIN/classifier/override" \
-H "authorization: Bearer $TOKEN" \
-H 'content-type: application/json' \
-d '{"host":"example-analytics.com","category":"analytics","reason":"manual-review"}'
# Audit logs (last 7 days for a site)
curl -sS -H "authorization: Bearer $TOKEN" \
"$ADMIN/audit/logs?siteKey=$SITE_KEY&range=7d" | jq '.items[0]'
# Audit logs CSV (filter by action)
curl -sS -H "authorization: Bearer $TOKEN" \
"$ADMIN/audit/logs/export?siteKey=$SITE_KEY&action=sites.domains.add&range=30d" | head
Export consent events (CSV, last 7 days)
curl -sS -H "authorization: Bearer $TOKEN"
"$REG/admin/analytics/sites/by-key/$SITE_KEY/consents/export?format=csv&range=7d" | head
Export consent events (JSON, explicit window, GPC-only)
curl -sS -H "authorization: Bearer $TOKEN"
"$ADMIN/analytics/sites/by-key/$SITE_KEY/consents/export?format=json&from=2025-09-01T00:00:00.000Z&to=2025-09-13T00:00:00.000Z&gpc=1"
Export consent events (NDJSON)
curl -sS -H "authorization: Bearer $TOKEN"
"$REG/admin/analytics/sites/by-key/$SITE_KEY/consents/export?format=jsonl&range=24h" | head -n 2
Notes
- Admin routes are protected with OIDC (JWKS discovery or static JWKS). The guard accepts
cmp.adminviascope,roles(CMP_ADMINorcmp.admin), orpermissions. regionis returned inGET /v1/configto aid client UX and auditing.
Swagger
- UI:
/api/docs(relative server base/apiensures Try-It-Out targets the current origin) - JSON:
/api/docs-json
Swagger 404 troubleshooting
- If
GET /api/docsreturns 404 with JSON{ "message": "Cannot GET /api/docs" }:- Ensure the Registry app is running and has logged the docs URL at startup.
- Verify
SWAGGER_PATHis not set to a non-default value. When set, docs are mounted at that path (e.g.,/docs). TryGET /docsandGET /api/docs-json. - Check the service port. Locally the Registry listens on
PORT=3318by default; the container image uses3410internally (Ingress usually exposes 80/443). - Test the raw spec at
/api/docs-jsonto rule out asset/CDN issues.
Metrics (Prometheus)
The registry exposes /metrics with Prometheus counters/gauges:
cmp_http_rate_limited_total{route}— number of 429 responses per routecmp_config_requests_total{status,origin_allowed}— config responses by status and CORS outcomecmp_consent_requests_total{status,cors}— consent responses by status and CORS allow/deny- POST
/v1/classify-cookies?site_key=…— batch classify cookies (license‑clean)- Request items:
[{ name, domain?, host?, firstParty?, secure?, httpOnly?, sameSite?, maxAgeSec? }] - Response items:
{ name, domain, firstParty, vendor?, category, purpose?, retention, flags, confidence, evidence[] } - Notes: No cookie values are accepted or returned. Definitions/overrides are stored internally; evidence rows are recorded for analytics.
- Request items:
- Cookies (admin):
- POST
/admin/cookies/definitions:batch— upsert pattern definitions{ items: [{ namePattern, isRegex?, vendor?, category, purpose?, retentionHint?, firstPartyDefault?, confidence? }] } - POST
/admin/cookies/overrides— upsert per‑site override{ site_key?, name, domain?, vendor?, category, purpose?, retention?, reason? } - GET
/admin/analytics/sites/by-key/:siteKey/cookies?range=7d|30d|custom&from&to&firstParty=true|false&category=analytics,advertising&unknownOnly=true&minConfidence=70&search=abc&limit=500— cookie evidence aggregation for portal analytics (privacy‑safe, no values)
- POST
- POST
cmp_dataset_fetch_success_total{source}— dataset job successescmp_dataset_fetch_timestamp{source}— Unix timestamp of last successful dataset fetchcmp_gpc_detected_total{route}— requests withSec-GPC: 1cmp_cookie_unknown_total{site}— unknown cookies (no DB definition matched; heuristics used) observed during classificationcmp_cookie_pre_consent_violations_total{site}— non‑essential cookies seen pre‑consent (increment when scan ingestion processes violations)
Use the provided ServiceMonitor and Grafana dashboard JSON under kubernetes/ and grafana/dashboards/.