Skip to main content

CMP Registry API — Dev Guide

Last updated: 2026-01-09 (Consolidated under docs/cmp)

The registry serves multi‑tenant CMP configurations, provides host classification, and ingests consent events. Admin endpoints are protected with OIDC (JWKS) and optionally accept HS256‑signed JWTs from our IDP when configured.

Note: All API routes are served under the base path /api.

Endpoints

  • GET /v1/config?site_key=...&v=live

    • Returns: { version, ui, categories, vendors, geoRules, region }
    • region is detected from headers (e.g., CF-IPCountry) and may be used by clients for UI defaults.
    • Caching: ETag weak validator + Cache-Control: public, max-age=300, stale-while-revalidate=60.
  • POST /v1/consent

    • Body: { siteKey, categories: {analytics?,marketing?,functional?}, version, region?, source }
    • Appends an event; no PII is stored.
    • Respects Global Privacy Control (GPC): if the request includes header Sec-GPC: 1, the event is tagged with gpc: true and the metric cmp_gpc_detected_total{route="consent"} increments.
  • GET /v1/classify?host=example.com

    • Returns a canonical category for a third‑party host, e.g., { host, category: 'analytics'|'advertising'|'functional'|'social'|'uncategorized', reasons: string[] }.
    • Combines dataset evidence with any manual overrides.
    • See “Datasets & Sync” for how the classifier database is populated and how to run the sync jobs.
      • Guide: docs/cmp/datasets.md
  • Admin (requires Bearer token with cmp.admin):

    • POST /admin/sites/by-key/:siteKey/configs → create version (Body.publish=true to publish)
    • POST /admin/sites/by-key/:siteKey/publish{ version } marks an existing version as live
    • POST /admin/sites/by-key/:siteKey/rotate-key → rotates site key
    • GET /admin/sites?limit=50&offset=0 → lists sites with their current live version (paginated)
    • GET /admin/sites/by-key/:siteKey/domains{ items: [{host,id?,legacy?}] } list site domains (404 if site not found)
    • POST /admin/sites/by-key/:siteKey/domains → add/upsert a domain for a site (201 on success; 400 on bad host; 404 if site not found)
    • DELETE /admin/sites/by-key/:siteKey/domains → remove a domain (400 on bad host or trying to remove legacy primary; 404 if site not found)
    • POST /admin/sites/by-key/:siteKey/domains/copy-primary → copy legacy primary domain into domains list (404 if site not found; 400 if none set)
    • POST /admin/classifier/override{ host, category, reason? } upserts a manual classifier override
    • DELETE /admin/classifier/override?host=...&site_key?=... → deletes an override (400 on missing host; 404 if site_key provided but unknown)
    • POST /admin/classifier/overrides:list → list overrides with filters { query?, limit?, offset?, site_key?, scope? }
    • POST /admin/classifier/overrides:export → CSV export using same filters
    • POST /admin/classifier/overrides:import → CSV import, requires headers host,category (400 if empty or missing headers)
    • GET /admin/analytics/sites/by-key/:siteKey/consents?range=7d|30d&from=&to= → JSON summary { total, acceptAll, rejectAll, partial, gpc, start, end }
    • GET /admin/analytics/sites/by-key/:siteKey/consents/export?format=csv|json|jsonl&range=Nd|Nh&from=ISO&to=ISO&region=US&gpc=0|1 → CSV, JSON, or JSONL export
      • Query validation: format in csv|json|jsonl; range matches ^\d+(d|h)$; from/to are ISO 8601; gpc in 0|1.
      • 404 if siteKey is unknown; 400 for invalid queries.
  • Scans (admin):

    • POST /admin/scans — ingest a completed scan run (from a trusted scanner)
    • GET /admin/scans — list persisted runs (filters: siteKey, topDomain, url, gpc, from/to, pagination)
    • GET /admin/scans/:id — get a single run
    • POST /admin/scans/:id/baseline — promote this run as baseline for its site/topDomain/GPC
    • GET /admin/scans/baseline?siteKey=...&topDomain=...&gpc=true|false — get baseline (includes scanRun)
    • GET /admin/scans/summary/daily — daily rollups (runs, new_third_parties, new_cookie_names, ok_runs)
    • POST /admin/scans/run — trigger a scan via an external Scan API ({ url, siteKey, gpc?, scanApiBase, registryBase? }) and persist
  • Public (no auth):

    • GET /v1/scans?url=…&topDomain=…&take=… — list sanitized scans
    • GET /v1/scans/:id — sanitized scan detail
    • POST /v1/scans — ingest a sanitized result (no cookie values or response bodies)
    • Health: GET /health/scan-api — checks the configured Scan API base (see CMP_SCAN_API_BASE)

Environment

  • PORT (default 3318 for local dev; container image default is 3410)
  • CMP_REGISTRY_DATABASE_URL (PostgreSQL)
    • Datasets use AdGuard‑maintained WhoTracks.me exports by default; you can override URLs if needed. See docs/cmp/datasets.md
  • OIDC:
    • OIDC_ISSUER (required)
    • OIDC_AUDIENCE (required)
    • OIDC_JWKS_URI (optional; overrides discovery)
    • OIDC_HS_SECRET (optional) — when present, tokens are first verified with HS256 using this shared secret before falling back to JWKS (RS256/RS384/RS512, PS256/PS384/PS512, ES256/ES384/ES512). Use the same value as IDP JWT_SECRET for HMAC tokens.
      • Rotation plan: update the IDP and registry in lockstep, deploy the new secret, keep the previous secret active in the IDP until all producers rotate, then remove the old secret from both sides.
    • Accepted asymmetric JWT algorithms: RS256/RS384/RS512, PS256/PS384/PS512, ES256/ES384/ES512. Align with your IdP signing config.
  • Append auth:
    • APPEND_AUTH_REQUIRED=1 — requires Bearer tokens for POST /consent/v1/append and verifies them against the OIDC settings above (rejects missing/invalid/expired tokens with 401).
    • Startup validation fails fast when APPEND_AUTH_REQUIRED=1 but OIDC_ISSUER or OIDC_AUDIENCE are missing.

CORS & rate limiting (security)

  • /v1/consent enforces CORS by site: the Origin must match the Site domain configured for the provided siteKey.
  • /admin/* CORS is limited to the Portal origin(s) (configure via CORS_ADMIN_ALLOWED_ORIGINS).
  • /v1/config sets Access-Control-Allow-Origin dynamically for the requesting site; strict enforcement can be enabled.
  • /v1/scans is CORS-enabled for GET/POST; apply rate limits and/or CAPTCHA in production.
  • Rate limiting is applied to /v1/config and /v1/consent. Env (wired via Helm):
    • RATE_LIMIT_WINDOW_MS, RATE_LIMIT_MAX, RATE_LIMIT_SKIP_SUCCESS, RATE_LIMIT_STD_HDRS, RATE_LIMIT_LEGACY_HDRS, RATE_LIMIT_TRUST_PROXY
    • CORS_ADMIN_ALLOWED_ORIGINS, CORS_STRICT_CONFIG

Auth semantics

  • Admin routes require a Bearer token that conveys cmp.admin via any of: scope (space‑delimited), roles (CMP_ADMIN or cmp.admin), or permissions.
  • Consent append (POST /consent/v1/append) requires a valid Bearer token when APPEND_AUTH_REQUIRED=1. Verification enforces issuer/audience, signature (HS256 or JWKS), and a non‑expired exp claim.
  • Append auth failures are tracked via cmp_append_auth_failures_total{reason=...} (invalid_token, jwks_fetch_failed, missing_token, missing_config, missing_exp).
  • Dev helpers (non‑production only):
    • DISABLE_ADMIN_AUTH=1 or DEV_BYPASS_ADMIN_AUTH=1 — skip verification entirely for local smoke testing.
    • DEV_ADMIN_BEARER=<static> — accept a specific static bearer for local testing.
  • Future: if additional services adopt M2M JWT validation, extract a shared @digiwedge/m2m-auth helper to centralize guard logic and env conventions.

Endpoints Expected by the Portal

This section documents the minimal endpoints and shapes the CMP Portal binds to. Adjust routes if your Registry uses different paths — the Portal code is isolated so you can update endpoints in one place.

Sites list (admin)

  • Route: GET /admin/sites
  • Query params:
    • q (optional) – search term for remote Site Select
    • page (1‑based) and pageSize – for debounced search results
    • Legacy pagination (supported for simple list): limit, offset
  • Recommended response (paginated):
{
"items": [
{ "key": "DEV_SITE_KEY", "name": "Development", "siteKey": "DEV_SITE_KEY" },
{ "key": "HOWTH-YC", "name": "Howth Yacht Club" }
],
"total": 2
}
  • Legacy response (still handled by the Portal in some fetchers):
[
{ "siteKey": "DEV_SITE_KEY" },
{ "siteKey": "HOWTH-YC" }
]

Config list + upsert (admin)

  • List: GET /admin/config
    • Query: page, pageSize, q, optional siteKey
    • Response:
{
"items": [
{ "key": "consent.banner.enabled", "value": "true", "siteKey": "DEV_SITE_KEY", "updatedAt": "2025-09-15T12:34:56.000Z" },
{ "key": "consent.defaultLocale", "value": "en", "updatedAt": "2025-09-15T11:11:11.000Z" }
],
"total": 2
}
  • Create: POST /admin/config
Body: { "key": "string", "value": "string", "siteKey?": "string" }
200: { "key": "string", "value": "string", "siteKey?": "string", "updatedAt": "ISO" }
  • Update: PUT /admin/config
Body: { "key": "string", "value": "string", "siteKey?": "string" }
200: { "key": "string", "value": "string", "siteKey?": "string", "updatedAt": "ISO" }
  • Delete: DELETE /admin/config?key=...&siteKey?=...
200: { "ok": true }

Notes

  • The Portal treats (siteKey,key) as a composite identity when present; in edit mode it does not change the key.
  • If your Registry uses resourceful routing (e.g., PUT /admin/config/:key), adapt the portal API helper paths accordingly.

Analytics list (admin)

  • Route: GET /admin/analytics
  • Query: page, pageSize, optional q, optional siteKey
  • Response:
{
"items": [
{
"id": "20250915-DEV_SITE_KEY-consents.accepted",
"siteKey": "DEV_SITE_KEY",
"metric": "consents.accepted",
"value": 1234,
"windowStart": "2025-09-15T00:00:00.000Z",
"windowEnd": "2025-09-15T23:59:59.999Z"
}
],
"total": 1
}

Where metric is a stable string key for the aggregated metric (e.g., consents.accepted, consents.rejected, consents.partial).

Validation & error semantics

  • Inputs are validated via NestJS ValidationPipe (whitelist + transform). Invalid requests return 400 Bad Request with a standard error body.
  • Admin domain and classifier endpoints return 404 Not Found for unknown site keys when explicitly provided, and 400 Bad Request for missing/invalid fields.
  • Admin sites domains listing returns 404 Not Found when the site key does not exist.

Prisma setup

pnpm nx run cmp-registry-data:prisma:generate
pnpm nx run cmp-registry-data:prisma:db-push

Database (Kubernetes)

CMP follows the same external pattern as billing: a Service LoadBalancer with a dedicated port.

  • Service: cmp-database (type: LoadBalancer)
  • External port: 5440 → targetPort 5432
  • DSN example (external):
postgresql://cmp_user:<PASSWORD>@cmp-db.uat.digiwedge.com:5440/cmp?schema=public
  • Infisical variable used by Prisma and services:
CMP_REGISTRY_DATABASE_URL=postgresql://cmp_user:<PASSWORD>@cmp-db.uat.digiwedge.com:5440/cmp?schema=public

Notes

  • On bare‑metal, ensure a LoadBalancer controller (for example, a cloud‑provider LB or MetalLB) issues a public EXTERNAL‑IP. Update the DNS A record for cmp-db.uat.digiwedge.com to that IP (or annotate the Service with external-dns.alpha.kubernetes.io/hostname: cmp-db.uat.digiwedge.com).
  • You can restrict external access to specific IPv4 ranges at the Service level:
spec:
type: LoadBalancer
ports:
- name: postgresql
port: 5440
targetPort: 5432
loadBalancerSourceRanges:
- '41.203.10.101/32' # example: allow only this IPv4

Troubleshooting

  • P1000 Authentication failed: reset the database user to match the Secret value.
PW=$(kubectl -n cmp get secret cmp-database-secret -o jsonpath='{.data.POSTGRES_PASSWORD}' | base64 -d)
kubectl -n cmp exec sts/cmp-database -- \
sh -lc "PGPASSWORD='$PW' psql -U cmp_user -d cmp -h 127.0.0.1 -c \"ALTER USER cmp_user WITH PASSWORD '$PW';\""
  • P1001 Can't reach database server:
    • Ensure the Service shows a public EXTERNAL-IP
    • Confirm firewall/ACLs are not blocking your source IP (see loadBalancerSourceRanges).
    • As a fallback, use a port‑forward for admin tasks:
kubectl -n cmp port-forward svc/cmp-database 15440:5432
export CMP_REGISTRY_DATABASE_URL="postgresql://cmp_user:<PASSWORD>@127.0.0.1:15440/cmp?schema=public"
pnpm nx run cmp-registry-data:prisma:db-push

Seed (DEV_SITE_KEY, DEV_SITE_KEY_2)

pnpm run cmp:seed

Seeds CookieDefinition from apps/cmp/registry/seed/cookies.seed.json.

# Ensure schema is pushed (once per environment)
infisical run -- pnpm -w nx run cmp-registry-data:prisma:db-push

# Seed cookie definitions (requires CMP_REGISTRY_DATABASE_URL)
infisical run -- pnpm -w nx run cmp-registry:seed:cookies

# Optional: verify count via Prisma
infisical run -- node -e "import('@prisma/cmp-registry').then(async (m)=>{const p=new m.PrismaClient();console.log('CookieDefinition count =', await p.cookieDefinition.count());await p.$disconnect();})"

Export Worker

The export worker processes queued ExportJob rows and creates ExportArtifact records that can be downloaded later.

  • Start locally:
pnpm -w nx run cmp-registry:jobs:exports-worker
  • Environment variables:

    • EXPORT_WORKER_IDLE_MS (default 2000)
    • EXPORT_WORKER_BATCH_LIMIT (default 1000000)
  • Kubernetes (same image as cmp-registry, different command):

apiVersion: apps/v1
kind: Deployment
metadata:
name: cmp-export-worker
namespace: cmp
spec:
replicas: 1
selector:
matchLabels: { app: cmp-export-worker }
template:
metadata:
labels: { app: cmp-export-worker }
spec:
containers:
- name: worker
image: registry.digiwedge.com/digiwedge/cmp-registry:latest
command: ['node', 'dist/apps/cmp/registry/src/jobs/exports-worker.js']
env:
- name: CMP_REGISTRY_DATABASE_URL
valueFrom: { secretKeyRef: { name: cmp-database-secret, key: DATABASE_URL } }
- name: EXPORT_WORKER_IDLE_MS
value: '2000'
- name: EXPORT_WORKER_BATCH_LIMIT
value: '1000000'
  • Portal usage:
    • POST /api/admin/exports/consents:start starts a job and returns jobId.
    • Poll GET /api/admin/exports/jobs/:id until status=doneartifactId.
    • Download via GET /api/admin/exports/:artifactId or list with GET /api/admin/exports?siteKey=....

Schema changes

  • Added ScanBaseline to store baseline per siteKey + topDomain + gpc.
  • Added AdminAuditLog to record admin actions (actor, action, siteKey, details), with indexes by tenant/site and action.
  • After pulling changes, run:
infisical run -- pnpm -w nx run cmp-registry-data:prisma:db-push

Local run

pnpm nx serve cmp-registry

### Environment variables

In addition to DB/OIDC/CORS/rate-limit configuration above:

- `CMP_SCAN_API_BASE` (optional)
- Base URL of the Scan API used by the public health aggregator at `GET /api/health/scan-api`.
- Example (local dev): `http://localhost:3006`
- Example (UAT): `https://cmp-scan-api.uat.digiwedge.com`
- If unset, `/api/health/scan-api` returns `{ ok: false, error: 'CMP_SCAN_API_BASE not set' }`.
# API base is /api
# GET http://localhost:3318/v1/config?site_key=DEV_SITE_KEY&v=live

Tenant Scoping

  • Sites belong to a Tenant (Site.tenantId). When CMP_ENFORCE_TENANT=1, admin APIs that operate on a site key enforce that the caller belongs to the same tenant.
  • Tenant is resolved from the admin token claim (tenant_id|tenantId|org_id|orgId|tid). For local testing, a header x-tenant-id may be supplied.
  • Affected endpoints (non-exhaustive):
    • GET /api/admin/analytics/sites/by-key/:siteKey/*
    • POST /api/admin/cookies/overrides* (site-scoped overrides)
    • POST /api/admin/scans/:id/baseline and GET /api/admin/scans/baseline
    • POST /api/admin/exports/consents:*, GET /api/admin/exports (siteKey), GET /api/admin/exports/:id (artifact siteKey check)

Admin Audit Logs

  • New table AdminAuditLog tracks admin actions with time, tenantId, siteKey, action key, actor identity (iss/sub/name), IP, and JSON details.
  • Logged events include:
    • cookies.definitions.batch, cookies.overrides.upsert|delete|import|export
    • scans.baseline.promote
    • exports.consents.start|job
    • sites.domains.add|remove|copy-primary
  • Retention is controlled by AUDIT_RETENTION_DAYS (default 180) processed by the retention job.

Endpoints

  • List logs (JSON)

    • GET /api/admin/audit/logs?siteKey=&action=&range=7d|30d|24h&from=&to=&limit=50&offset=0
    • Returns: { items: [{ id, ts, tenantId, siteKey, action, actorSub, actorIss, actorName, ip, details }], total }
    • Tenant scoping: when CMP_ENFORCE_TENANT=1, results are filtered by the caller’s tenant (header x-tenant-id or token claim).
  • Export logs (CSV)

    • GET /api/admin/audit/logs/export?siteKey=&action=&range=...&from=&to=
    • text/csv with header: ts,tenantId,siteKey,action,actorSub,actorIss,actorName,ip,details
    • Notes: details is JSON-serialized; values are quoted and escaped.

Rate Limiting

  • Public endpoints already include limits for /v1/config, /v1/consent, /v1/scans/run.
  • Admin limits added:
    • /api/admin/analytics/* — window ADMIN_ANALYTICS_WINDOW_MS (default 60s), max ADMIN_ANALYTICS_MAX (default 120)
    • /api/admin/exports/* — window ADMIN_EXPORTS_WINDOW_MS (default 60s), max ADMIN_EXPORTS_MAX (default 20)
  • 429s are counted in cmp_http_rate_limited_total{route}.

Curl examples

REG=http://localhost:3318/api
ADMIN=$REG/admin
# Live config with region and caching headers
curl -sSI "$REG/v1/config?site_key=DEV_SITE_KEY&v=live"

# Append consent event
curl -sS -X POST "$REG/v1/consent" \
-H 'content-type: application/json' \
-d '{"siteKey":"DEV_SITE_KEY","categories":{"analytics":true},"version":1,"source":"banner"}'

# Classify a host
curl -sS "$REG/v1/classify?host=googletagmanager.com"

# Classify cookies (batch)
curl -sS -H 'content-type: application/json' \
-X POST "$REG/v1/classify-cookies?site_key=DEV_SITE_KEY" \
-d '[{"name":"_ga","domain":".example.com","host":"www.googletagmanager.com","secure":true,"httpOnly":false,"sameSite":"Lax","maxAgeSec":63072000}]' | jq

# Run datasets now (Infisical) and re‑test classification
infisical run --env=dev -- pnpm -w nx run cmp-registry:jobs:datasets --tui=false --skip-nx-cache
curl -sS "$REG/v1/classify?host=www.google-analytics.com"

# Upsert a manual classifier override (requires Bearer token with cmp.admin)
curl -sS -X POST "$ADMIN/classifier/override" \
-H "authorization: Bearer $TOKEN" \
-H 'content-type: application/json' \
-d '{"host":"example-analytics.com","category":"analytics","reason":"manual-review"}'

# Audit logs (last 7 days for a site)

curl -sS -H "authorization: Bearer $TOKEN" \
"$ADMIN/audit/logs?siteKey=$SITE_KEY&range=7d" | jq '.items[0]'

# Audit logs CSV (filter by action)

curl -sS -H "authorization: Bearer $TOKEN" \
"$ADMIN/audit/logs/export?siteKey=$SITE_KEY&action=sites.domains.add&range=30d" | head

curl -sS -H "authorization: Bearer $TOKEN"
"$REG/admin/analytics/sites/by-key/$SITE_KEY/consents/export?format=csv&range=7d" | head

curl -sS -H "authorization: Bearer $TOKEN"
"$ADMIN/analytics/sites/by-key/$SITE_KEY/consents/export?format=json&from=2025-09-01T00:00:00.000Z&to=2025-09-13T00:00:00.000Z&gpc=1"

curl -sS -H "authorization: Bearer $TOKEN"
"$REG/admin/analytics/sites/by-key/$SITE_KEY/consents/export?format=jsonl&range=24h" | head -n 2

Notes

  • Admin routes are protected with OIDC (JWKS discovery or static JWKS). The guard accepts cmp.admin via scope, roles (CMP_ADMIN or cmp.admin), or permissions.
  • region is returned in GET /v1/config to aid client UX and auditing.

Swagger

  • UI: /api/docs (relative server base /api ensures Try-It-Out targets the current origin)
  • JSON: /api/docs-json

Swagger 404 troubleshooting

  • If GET /api/docs returns 404 with JSON { "message": "Cannot GET /api/docs" }:
    • Ensure the Registry app is running and has logged the docs URL at startup.
    • Verify SWAGGER_PATH is not set to a non-default value. When set, docs are mounted at that path (e.g., /docs). Try GET /docs and GET /api/docs-json.
    • Check the service port. Locally the Registry listens on PORT=3318 by default; the container image uses 3410 internally (Ingress usually exposes 80/443).
    • Test the raw spec at /api/docs-json to rule out asset/CDN issues.

Metrics (Prometheus)

The registry exposes /metrics with Prometheus counters/gauges:

  • cmp_http_rate_limited_total{route} — number of 429 responses per route
  • cmp_config_requests_total{status,origin_allowed} — config responses by status and CORS outcome
  • cmp_consent_requests_total{status,cors} — consent responses by status and CORS allow/deny
    • POST /v1/classify-cookies?site_key=… — batch classify cookies (license‑clean)
      • Request items: [{ name, domain?, host?, firstParty?, secure?, httpOnly?, sameSite?, maxAgeSec? }]
      • Response items: { name, domain, firstParty, vendor?, category, purpose?, retention, flags, confidence, evidence[] }
      • Notes: No cookie values are accepted or returned. Definitions/overrides are stored internally; evidence rows are recorded for analytics.
    • Cookies (admin):
      • POST /admin/cookies/definitions:batch — upsert pattern definitions { items: [{ namePattern, isRegex?, vendor?, category, purpose?, retentionHint?, firstPartyDefault?, confidence? }] }
      • POST /admin/cookies/overrides — upsert per‑site override { site_key?, name, domain?, vendor?, category, purpose?, retention?, reason? }
      • GET /admin/analytics/sites/by-key/:siteKey/cookies?range=7d|30d|custom&from&to&firstParty=true|false&category=analytics,advertising&unknownOnly=true&minConfidence=70&search=abc&limit=500 — cookie evidence aggregation for portal analytics (privacy‑safe, no values)
  • cmp_dataset_fetch_success_total{source} — dataset job successes
  • cmp_dataset_fetch_timestamp{source} — Unix timestamp of last successful dataset fetch
  • cmp_gpc_detected_total{route} — requests with Sec-GPC: 1
  • cmp_cookie_unknown_total{site} — unknown cookies (no DB definition matched; heuristics used) observed during classification
  • cmp_cookie_pre_consent_violations_total{site} — non‑essential cookies seen pre‑consent (increment when scan ingestion processes violations)

Use the provided ServiceMonitor and Grafana dashboard JSON under kubernetes/ and grafana/dashboards/.