Automation & Certificate Lifecycle
Celery powers Cirrus CDN’s asynchronous workflows: ACME certificate issuance, renewal scans, and node health checks. This chapter dissects task scheduling, locking strategies, and external integrations driven by src/cirrus/celery_app.py.
Celery Configuration
celery_app.py defines a Celery instance with Redis as both broker and result backend:
- Broker URL defaults to
redis://[password@]host:port/0. - Result backend defaults to database
1. - Serializers are JSON-only; timezone defaults to UTC.
The Celery beat schedule includes:
acme_renewal_scan– Cron-scheduled (default hourly at minute 0). Runscirrus.acme.scan_and_renew.cname_node_health– Periodic task scheduled viacelery.schedules.schedule, interval derived from node health settings (NODE_HEALTH_INTERVAL_SECS).
Queues are configurable via environment variables (ACME_RENEW_QUEUE, CNAME_HEALTH_QUEUE).
ACME Certificate Issuance
Task Flow
acme_issue_task (Celery task name cirrus.acme.issue_certificate) orchestrates issuance:
- Generates a per-task token (Celery
request.idor random hex). - Calls
_acme_issue_task_async(domain, token)insideasyncio.run. - Acquires a Redis lock (
cdn:acme:lock:{domain}) with TTL (ACME_LOCK_TTL, default 900 seconds). Skips if lock exists. - Persists task ID in
cdn:acme:task:{domain}for operator visibility. - Marks ACME status as
"running"incdn:acme:{domain}. - Ensures ACME registration exists by calling
ensure_acme_registered(acme_common.py), which interacts with acme-dns viahttpx.AsyncClient. - Optionally enforces
_acme-challengeCNAME readiness (ENFORCE_ACME_CNAME_CHECK,WAIT_FOR_CNAME,CNAME_WAIT_SECS). - Loads or generates an ACME account (
cdn:acmeacct:global) and a domain private key (cdn:acmecertkey:{domain}). - Issues the certificate using
sewerviaissue_certificate_with_sewer. - Stores the resulting fullchain PEM and private key in
cdn:cert:{domain}, updates ACME status to"issued", and caches issuance timestamp. - Unlocks by deleting
cdn:acme:task:{domain}and the lock key (if owned).
Errors set status "failed" and log acme_fail with details; locks are cleaned up in finally blocks to prevent deadlocks.
External Services
- acme-dns (container
acmedns) – Handles challenge updates.ensure_acme_registeredregisters new accounts and stores credentials in Redis. - Caddy (container
caddy) – Provides a local ACME directory (https://caddy:4431/acme/local/directory). Workers trust its root CA by copying the bundle to/app/certs/root-ca.crt(seedocker/entrypoint.sh).
Certificate Renewal Scans
acme_scan_and_renew_task (task name cirrus.acme.scan_and_renew) executes:
- Acquires a global scan lock (
cdn:acme:renew:scan_lock) to prevent overlapping scans. - Iterates over domains from
cdn:domains, filtering for those withuse_acme_dns01. - Skips domains currently locked or queued for issuance.
- Evaluates if the certificate is expiring soon using
is_cert_expiring_soon(thresholdACME_RENEW_BEFORE_DAYS, default 30 days). - Enqueues issuance tasks (up to
ACME_RENEW_MAX_PER_SCAN, default 10), marking Redis status as"queued". - Records skipped domains (locked or not due) and collects per-domain errors.
- Releases the scan lock, ensuring cleanup even on exceptions.
The task returns a structured dict summarizing queued renewals and reasons for skips, aiding observability.
Node Health Checks
cname_health_check_task (task name cirrus.cname.health_check) runs on the interval configured in NodeHealthSettings:
- Calls
_cname_health_check_task_async, which executesperform_health_checksfromcname/health.py. - For each node, attempts HTTP GET to
http://<ip>:<port>/healthz(IPv6 addresses bracketed). - Increments failure counters and deactivates nodes when
fails_to_downthreshold is met; reactivates uponsuccs_to_up. - Publishes
cdn:cname:dirtywhen node activation state flips, triggering DNS updates. - Returns an array of results with node IDs, statuses (
healthy,failed,down,recovered,no-address), and optional error messages.
Redis Utilities
Helper functions use redis.asyncio.Redis created via _create_async_redis():
perform_health_checksoperates on the Redis client supplied by the caller; helpers such as_cname_health_check_task_asyncclose the connection once the check completes.- ACME tasks wrap Redis interactions in
try/finallyto ensure connections close even on error.
Locking & Concurrency Controls
- Domain Locks –
cdn:acme:lock:{domain}prevents simultaneous issuance tasks per domain. - Task Keys –
cdn:acme:task:{domain}aids operator visibility and prevents duplicates. - Scan Lock –
cdn:acme:renew:scan_lockensures single renewal sweep across workers. - Pub/Sub Events –
publish_zone_dirty(incname/service.py) is invoked whenever domain/node changes require DNS refresh, ensuring eventual consistency across components.
Error Handling & Retries
- Celery uses default retry policy (no automatic retries). Failures are logged and surfaced via Redis status keys, allowing operators to investigate before re-triggering tasks.
- Certificate issuance catches all exceptions, updates status to
"failed", and ensures locks are released to avoid indefinite blocking. - Renewal scans log upstream exceptions and include error messages in the result payload for dashboards or alerting.
Observability Hooks
- Logging: The API logs queueing via
acme_queued, while Celery tasks emitacme_start,acme_done, andacme_fail. Renewal scans logacme_auto_renew_queuedandacme_auto_renew_error. - Redis Keys: Operators can inspect
cdn:acme:{domain}to monitor status transitions (init,registered,queued,running,issued,failed). - Metrics: While Celery does not emit Prometheus metrics out of the box, logs and Redis data offer visibility. Chapter 9 covers potential enhancements.
Automation keeps certificates valid and node inventories accurate without manual intervention. The next chapter explains how the DNS and traffic engineering layer consumes this automation data to steer clients toward healthy edge nodes.