Skip to main content

Automation & Certificate Lifecycle

Celery powers Cirrus CDN’s asynchronous workflows: ACME certificate issuance, renewal scans, and node health checks. This chapter dissects task scheduling, locking strategies, and external integrations driven by src/cirrus/celery_app.py.

Celery Configuration

celery_app.py defines a Celery instance with Redis as both broker and result backend:

  • Broker URL defaults to redis://[password@]host:port/0.
  • Result backend defaults to database 1.
  • Serializers are JSON-only; timezone defaults to UTC.

The Celery beat schedule includes:

  • acme_renewal_scan – Cron-scheduled (default hourly at minute 0). Runs cirrus.acme.scan_and_renew.
  • cname_node_health – Periodic task scheduled via celery.schedules.schedule, interval derived from node health settings (NODE_HEALTH_INTERVAL_SECS).

Queues are configurable via environment variables (ACME_RENEW_QUEUE, CNAME_HEALTH_QUEUE).

ACME Certificate Issuance

Task Flow

acme_issue_task (Celery task name cirrus.acme.issue_certificate) orchestrates issuance:

  1. Generates a per-task token (Celery request.id or random hex).
  2. Calls _acme_issue_task_async(domain, token) inside asyncio.run.
  3. Acquires a Redis lock (cdn:acme:lock:{domain}) with TTL (ACME_LOCK_TTL, default 900 seconds). Skips if lock exists.
  4. Persists task ID in cdn:acme:task:{domain} for operator visibility.
  5. Marks ACME status as "running" in cdn:acme:{domain}.
  6. Ensures ACME registration exists by calling ensure_acme_registered (acme_common.py), which interacts with acme-dns via httpx.AsyncClient.
  7. Optionally enforces _acme-challenge CNAME readiness (ENFORCE_ACME_CNAME_CHECK, WAIT_FOR_CNAME, CNAME_WAIT_SECS).
  8. Loads or generates an ACME account (cdn:acmeacct:global) and a domain private key (cdn:acmecertkey:{domain}).
  9. Issues the certificate using sewer via issue_certificate_with_sewer.
  10. Stores the resulting fullchain PEM and private key in cdn:cert:{domain}, updates ACME status to "issued", and caches issuance timestamp.
  11. Unlocks by deleting cdn:acme:task:{domain} and the lock key (if owned).

Errors set status "failed" and log acme_fail with details; locks are cleaned up in finally blocks to prevent deadlocks.

External Services

  • acme-dns (container acmedns) – Handles challenge updates. ensure_acme_registered registers new accounts and stores credentials in Redis.
  • Caddy (container caddy) – Provides a local ACME directory (https://caddy:4431/acme/local/directory). Workers trust its root CA by copying the bundle to /app/certs/root-ca.crt (see docker/entrypoint.sh).

Certificate Renewal Scans

acme_scan_and_renew_task (task name cirrus.acme.scan_and_renew) executes:

  1. Acquires a global scan lock (cdn:acme:renew:scan_lock) to prevent overlapping scans.
  2. Iterates over domains from cdn:domains, filtering for those with use_acme_dns01.
  3. Skips domains currently locked or queued for issuance.
  4. Evaluates if the certificate is expiring soon using is_cert_expiring_soon (threshold ACME_RENEW_BEFORE_DAYS, default 30 days).
  5. Enqueues issuance tasks (up to ACME_RENEW_MAX_PER_SCAN, default 10), marking Redis status as "queued".
  6. Records skipped domains (locked or not due) and collects per-domain errors.
  7. Releases the scan lock, ensuring cleanup even on exceptions.

The task returns a structured dict summarizing queued renewals and reasons for skips, aiding observability.

Node Health Checks

cname_health_check_task (task name cirrus.cname.health_check) runs on the interval configured in NodeHealthSettings:

  • Calls _cname_health_check_task_async, which executes perform_health_checks from cname/health.py.
  • For each node, attempts HTTP GET to http://<ip>:<port>/healthz (IPv6 addresses bracketed).
  • Increments failure counters and deactivates nodes when fails_to_down threshold is met; reactivates upon succs_to_up.
  • Publishes cdn:cname:dirty when node activation state flips, triggering DNS updates.
  • Returns an array of results with node IDs, statuses (healthy, failed, down, recovered, no-address), and optional error messages.

Redis Utilities

Helper functions use redis.asyncio.Redis created via _create_async_redis():

  • perform_health_checks operates on the Redis client supplied by the caller; helpers such as _cname_health_check_task_async close the connection once the check completes.
  • ACME tasks wrap Redis interactions in try/finally to ensure connections close even on error.

Locking & Concurrency Controls

  • Domain Lockscdn:acme:lock:{domain} prevents simultaneous issuance tasks per domain.
  • Task Keyscdn:acme:task:{domain} aids operator visibility and prevents duplicates.
  • Scan Lockcdn:acme:renew:scan_lock ensures single renewal sweep across workers.
  • Pub/Sub Eventspublish_zone_dirty (in cname/service.py) is invoked whenever domain/node changes require DNS refresh, ensuring eventual consistency across components.

Error Handling & Retries

  • Celery uses default retry policy (no automatic retries). Failures are logged and surfaced via Redis status keys, allowing operators to investigate before re-triggering tasks.
  • Certificate issuance catches all exceptions, updates status to "failed", and ensures locks are released to avoid indefinite blocking.
  • Renewal scans log upstream exceptions and include error messages in the result payload for dashboards or alerting.

Observability Hooks

  • Logging: The API logs queueing via acme_queued, while Celery tasks emit acme_start, acme_done, and acme_fail. Renewal scans log acme_auto_renew_queued and acme_auto_renew_error.
  • Redis Keys: Operators can inspect cdn:acme:{domain} to monitor status transitions (init, registered, queued, running, issued, failed).
  • Metrics: While Celery does not emit Prometheus metrics out of the box, logs and Redis data offer visibility. Chapter 9 covers potential enhancements.

Automation keeps certificates valid and node inventories accurate without manual intervention. The next chapter explains how the DNS and traffic engineering layer consumes this automation data to steer clients toward healthy edge nodes.