Skip to main content

CMP Operator Checklist (Dev → Prod)

Last updated: 2026-01-09

This is a pragmatic checklist to deploy and harden the Registry, Portal, and Jobs across environments.

0) Prereqs

  • IDP — configure issuer and audience for JWTs, allow the Portal origin via CORS, and ensure admin role/scope cmp.admin exists. Optionally provide a shared secret for HS256 tokens.
  • Database — PostgreSQL URL for the Registry.
  • Gateway & Routing — Cilium Gateway API installed with HTTPRoute support; external-gateway configured for HTTPS. See docs/cmp/cilium-gateway-setup.md for routing details.
  • DNS & TLS — public hostnames for the Registry and Portal; TLS certificates managed by cert-manager or external DNS provider.

1) Secrets

Create a secret cmp-registry-secret (per environment) with at least:

  • CMP_REGISTRY_DATABASE_URL
  • OIDC_ISSUER, OIDC_AUDIENCE, (OIDC_JWKS_URI optional), (OIDC_HS_SECRET optional)
  • Accepted JWT algorithms: RS256/RS384/RS512, PS256/PS384/PS512, ES256/ES384/ES512
  • HS256 rotation plan: update IDP + registry together, deploy new OIDC_HS_SECRET, keep the previous secret in the IDP until all producers rotate, then remove the old secret from both sides
  • APPEND_AUTH_REQUIRED=1 in prod to enforce Bearer auth for POST /consent/v1/append
  • Dataset URLs (WTM, AdGuard, IAB GVL) if you use the provided jobs
  • (Optional) METRICS_PASS when exposing /metrics with Basic Auth
  • (Optional) CONSENT_IP_SALT if enabling hashed IP derivation at the edge

2) Helm values (Registry)

Set sane defaults for resources, autoscaling, CORS, rate limits, and NetworkPolicy. Example:

image:
repository: registry.digiwedge.com/digiwedge/cmp-registry
tag: <pinned>

resources:
requests: { cpu: 100m, memory: 256Mi }
limits: { cpu: 1000m, memory: 1Gi }

autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 6
targetCPUUtilizationPercentage: 70

rateLimit:
enabled: true
windowMs: 60000
max: 300
skipSuccessful: false
standardHeaders: true
legacyHeaders: false
trustProxy: true

cors:
adminAllowedOrigins:
- https://cmp-portal.example.com
strictConfigOrigin: true

metrics:
enabled: true
path: /metrics
basicAuth: { enabled: false }

networkPolicy:
enabled: true
ingress:
allowFromNamespaces: [kube-system, monitoring, cmp]
egress:
restricted: true
# DNS via kube-system (CoreDNS) + DB via CIDR or namespace
allowNamespaces: [kube-system]
allowCIDRs: [10.42.0.0/16] # example DB subnet

3) Helm values (Portal)

image:
repository: registry.digiwedge.com/digiwedge/cmp-portal
tag: <pinned>

resources:
requests: { cpu: 50m, memory: 128Mi }
limits: { cpu: 500m, memory: 512Mi }

autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 60

networkPolicy:
enabled: true
ingress:
allowFromNamespaces: [kube-system, cmp]
egress:
restricted: false # typically open for frontends; tighten if needed

4) Deploy with Argo CD

  • Point Applications to the Helm chart paths: charts/cmp-registry, charts/cmp-portal, charts/cmp-jobs.
  • Provide the environment specific values and secrets.
  • Enable image updater annotations if you use argocd-image-updater.
  • Apply HTTPRoute and ReferenceGrant manifests: kubectl apply -f kubernetes/cmp/portal/httproute.yaml and kubernetes/idp/cmp-to-idp-referencegrant.yaml. See docs/cmp/cilium-gateway-setup.md for routing architecture.

5) Observability

  • Apply ServiceMonitor for the Registry scraping /metrics.
  • Apply PrometheusRule alerts (dataset freshness, 429 spikes, CORS denials, CronJobs).
  • Import Grafana dashboard grafana/dashboards/cmp-overview.json.

6) CI gates

  • Configure repo secrets for scanner CI:
    • CMP_REGISTRY_URL, CMP_SITE_KEY, CMP_CANARY_URL
  • Run manual CMP scanner checks (baseline + GPC) and review the diff report.
  • Run the CMP React SDK a11y harness locally and review any findings.

7) Post‑deploy checks

  • GET /v1/config?site_key=...&v=live returns 200 with ETag and Cache‑Control.
  • /v1/consent accepts consent and records events; with Sec-GPC: 1, events have gpc=true.
  • /metrics exposes counters (429s, config/consent decisions, dataset freshness, GPC).
  • Portal login works; Sites page shows analytics; export consents (CSV/JSON) downloads correctly.
  • Multi-domain CORS: Sites page "Allowed origins (CORS)" shows and updates host list; adding a second origin allows /v1/consent calls from that origin.
  • Scanner CI gate is green for canary (baseline + GPC).
  • HTTPRoute validation: Portal routes work correctly:
    • curl https://cmp-portal.uat.digiwedge.com/api/auth/csrf → 204 (IDP backend)
    • curl https://cmp-portal.uat.digiwedge.com/api/health → 200 (Registry backend)
    • curl https://cmp-portal.uat.digiwedge.com/ → 200 (Portal frontend)

8) Hardening tips

  • Pin images by digest and tag.
  • Tighten CSP for the Registry UI (if ever presented) and Portal domains.
  • Consider Basic Auth on /metrics (set METRICS_BASIC_AUTH=true and METRICS_PASS).
  • Enable PDBs and Pod anti‑affinity (already in the charts) to improve availability during maintenance.

9) Privacy & retention runbook

  • IP minimization: Registry does not persist client IPs in ConsentEvent by design. If a downstream team ever requires coarse network correlation, use a one‑way hash with salt at the edge and only propagate ipHash: ipHash = sha256(ip + CONSENT_IP_SALT). Do not store raw IPs. Keep CONSENT_IP_SALT in the cmp-registry-secret.

  • Retention: Default consent retention is 13 months (configurable via CONSENT_RETENTION_DAYS). The CronJob cmp-consent-retention deletes rows by timestamp ts older than the cutoff. To dry‑run locally:

    1. Exec into a throwaway job pod and inspect counts grouped by month.
    2. Set CONSENT_RETENTION_DAYS=395 and run the script once to purge >13‑month data.

    The job uses the correct timestamp field ts; no PII columns are touched.

10) SLO dashboards

  • Latency SLOs: The Registry exports cmp_http_request_duration_ms{route,method,status}. Use Grafana panels based on histogram_quantile(0.95, sum by (le) (rate(cmp_http_request_duration_ms_bucket{route="config"}[5m]))) and 0.99 for p99. Target p95 for /v1/config under steady state.

  • 429 budget: Use the included panel or derive as sum(rate(cmp_http_rate_limited_total[5m])) / clamp_min(sum(rate(cmp_config_requests_total[5m]) + rate(cmp_consent_requests_total[5m])), 1e-6) and keep below your agreed budget. Alerts are labeled team=cmp for routing.

11) NetworkPolicy verification

  • Ingress: With defaults, only kube-system, monitoring, and cmp namespaces can reach the Registry Service.

    • kubectl run -n default test --rm -it --image=curlimages/curl -- curl -sS http://cmp-registry.cmp.svc/ → DENY
    • kubectl run -n cmp test --rm -it --image=curlimages/curl -- curl -sS http://cmp-registry.cmp.svc/ → OK
  • Egress (Registry): Set networkPolicy.egress.restricted=true and populate allowNamespaces: [kube-system] (DNS) and allowCIDRs: [<DB CIDRs>]. Verify DNS resolves and DB connectivity succeeds, while public internet is blocked from Registry pods.

  • CronJobs: Dataset and retention jobs run in the cmp-jobs chart and are not subject to the Registry egress policy. They retain open egress to fetch datasets. If you apply a NetPol for jobs, ensure the dataset endpoints remain allowed.