diff --git a/docs/decisions/001-webhook-auto-apply.md b/docs/decisions/001-webhook-auto-apply.md index 795833f..95e5ceb 100644 --- a/docs/decisions/001-webhook-auto-apply.md +++ b/docs/decisions/001-webhook-auto-apply.md @@ -72,6 +72,27 @@ Add a `/webhook` path handler alongside the existing CFN handler in a Function URL directly). The URL is added to the GHE org or repo webhook settings. +### Webhook payload — what GHE sends + +The GHE `push` event payload contains everything the Lambda needs to identify +the repo without any out-of-band mapping: + +```json +{ + "ref": "refs/heads/main", + "after": "abc123def456...", + "repository": { + "name": "229685449397-csvd-dev-platform-dev-gov", + "full_name": "SCT-Engineering/229685449397-csvd-dev-platform-dev-gov", + "clone_url": "https://github.e.it.census.gov/SCT-Engineering/229685449397-csvd-dev-platform-dev-gov.git" + } +} +``` + +- `repository.name` → `ACCOUNT_REPO` passed to CodeBuild +- `after` → merge commit SHA used for GHE commit status writeback +- No repo→callback URL map is needed or maintained + **Request flow:** ``` @@ -79,15 +100,29 @@ GHE push event (main branch, account repo) → Lambda Function URL POST / → verify HMAC-SHA256 signature (secret in SM: ghe-runner/webhook-secret) → parse X-GitHub-Event: push - → filter: ref == refs/heads/main - → filter: repo name matches account repo pattern - → fetch .sc-automation.yml via GHE API (no clone needed) + → filter: ref == refs/heads/main AND repository.name matches account repo pattern + → fetch .sc-automation.yml from main via GHE API (no clone — single API call) + → if .sc-automation.yml missing: post ❌ commit status "no .sc-automation.yml on main" and exit + → post ⏳ "pending" commit status on merge SHA → for each entry in apply_on_merge: - start_codebuild_build(action="apply", account_repo=..., layer=..., ...) - (fire-and-forget — do NOT block for CodeBuild completion) - → return 200 OK immediately + start_codebuild_build( + action="apply", + account_repo=payload["repository"]["name"], # from webhook + layer=entry["layer"], # from .sc-automation.yml + region_dir=entry["region_dir"], # from .sc-automation.yml + target_account_id=entry.get("target_account_id", ""), + commit_sha=payload["after"] # for status writeback + ) + (fire-and-forget — do NOT poll CodeBuild) + → return HTTP 200 immediately ``` +**Executor buildspec writeback:** +The executor CodeBuild build receives `COMMIT_SHA` as an env var. In its +POST_BUILD phase it calls `gh api` to post a GHE commit status (`success` or +`failure`) back to the merge commit. Teams see ✅ or ❌ directly on the commit +in the PR history — no CloudWatch required. + **Key differences from the CFN handler:** - **No polling.** The webhook handler starts builds and returns immediately. @@ -111,15 +146,19 @@ GHE push event (main branch, account repo) ### `.sc-automation.yml` lifecycle -- **Proposer writes it** when it first creates the branch (if the file doesn't - exist yet). The Proposer knows `layer`, `region_dir`, and `target_account_id` - from its build environment variables. It commits `.sc-automation.yml` alongside - the rendered template files. -- **Platform engineers edit it** directly via PR if they need to add or remove +- **Proposer writes it** on the first run for a branch, if the file doesn't + already exist on `main`. The Proposer knows `layer`, `region_dir`, and + `target_account_id` from its CodeBuild env vars. It commits `.sc-automation.yml` + alongside the rendered template files so the file is reviewed in the same PR. +- **Proposer does NOT overwrite it** on subsequent runs — it checks whether the + file already exists on `main` and skips writing if so, preserving any manual + edits made by platform engineers. +- **Platform engineers edit it** directly via PR to add, remove, or reorder apply targets. -- **The file is idempotent** — subsequent Proposer runs `--force-with-lease` push - won't break it because the Proposer will only write the file if it doesn't - already exist (avoiding clobbering manual edits). +- **GHE commit status missing → blocked** — if `.sc-automation.yml` is not + present on `main` when a push webhook fires, the Lambda posts a `failure` + commit status and does not start any builds. This surfaces the problem + immediately without a silent no-op. --- @@ -129,7 +168,9 @@ GHE push event (main branch, account repo) - Eliminates the manual "provision executor product" step after PR merge - Apply is fully traceable: GHE push event → CloudWatch Logs → CodeBuild build ID +- GHE commit status writeback gives teams ✅/❌ feedback directly on the merge commit - No new infrastructure services (no EventBridge, no SQS, no API Gateway) +- No repo→callback URL map to maintain — repo identity comes from the webhook payload - The executor SC product remains available for manual one-off runs and day-2 operations (re-run from a specific tag, dry-run, etc.) @@ -146,10 +187,10 @@ GHE push event (main branch, account repo) ### Out of scope for this ADR -- Result notification (Slack, email) after a webhook-triggered apply — tracked - separately -- Path filtering (only trigger on changes under `{layer}/{region_dir}/`) — - tracked separately +- SNS / Slack / email notification after a webhook-triggered apply — tracked separately +- Path filtering (only trigger on changes under `{layer}/{region_dir}/`) — tracked separately +- Idempotency guard against GHE webhook retries firing duplicate builds — `tf-run apply` + on an already-converged state is a safe no-op, so this is deferred ---