Skip to content

Commit

Permalink
docs: ADR-001 webhook auto-apply on merge to main (proposed)
Browse files Browse the repository at this point in the history
  • Loading branch information
Dave Arnold committed May 11, 2026
1 parent 97921ff commit c6ac447
Showing 1 changed file with 167 additions and 0 deletions.
167 changes: 167 additions & 0 deletions docs/decisions/001-webhook-auto-apply.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# ADR-001: Webhook-Triggered Auto-Apply on Merge to Main

**Status:** Proposed
**Date:** 2026-05-11
**Branch:** feature/template-repo-rendering

---

## Context

The current two-product model requires a human to manually provision the
`tf-run-executor` Service Catalog product after a Proposer PR is reviewed and
merged. This adds unnecessary friction to the apply step:

1. Platform engineer reviews and merges the PR opened by the Proposer
2. Platform engineer opens Service Catalog, finds the executor product, fills in
the same parameters they already specified during the Propose step, and
clicks Launch

Step 2 is pure operational overhead. The information needed to start the executor
build (account repo, layer, region dir, target account) is already known at merge
time and could be stored in the repo itself.

---

## Decision

Add a **GitHub Enterprise webhook handler** to the Lambda that automatically
starts an executor CodeBuild build whenever a push event lands on `main` in a
watched account repo.

Target apply configuration is stored in a `.sc-automation.yml` file committed to
the root of each account repo by the Proposer (or manually by a platform engineer).

---

## Proposed Design

### `.sc-automation.yml` — committed to the account repo root

```yaml
# Written by the Proposer CodeBuild build or manually by a platform engineer.
# Each entry triggers one executor CodeBuild build when changes land on main.
apply_on_merge:
- layer: infrastructure
region_dir: west
target_account_id: "229685449397"
- layer: infrastructure
region_dir: east
target_account_id: "229685449397"
- layer: vpc
region_dir: west
target_account_id: "229685449397"
```
Fields per entry:
| Field | Required | Description |
|---|---|---|
| `layer` | yes | `common`, `infrastructure`, or `vpc` |
| `region_dir` | yes | `east`, `west`, or `global` |
| `target_account_id` | no | 12-digit AWS account ID; omit to run in csvd-dev |
| `tf_run_start_tag` | no | tf-run TAG label to start from |
| `dry_run` | no | `true` to plan only (default: `false`) |

### Lambda changes

Add a `/webhook` path handler alongside the existing CFN handler in
`lambda/app.py`.

**Invocation:** Lambda Function URL (no API Gateway needed — GHE can POST to
a Function URL directly). The URL is added to the GHE org or repo webhook
settings.

**Request flow:**

```
GHE push event (main branch, account repo)
→ Lambda Function URL POST /
→ verify HMAC-SHA256 signature (secret in SM: ghe-runner/webhook-secret)
→ parse X-GitHub-Event: push
→ filter: ref == refs/heads/main
→ filter: repo name matches account repo pattern
→ fetch .sc-automation.yml via GHE API (no clone needed)
→ for each entry in apply_on_merge:
start_codebuild_build(action="apply", account_repo=..., layer=..., ...)
(fire-and-forget — do NOT block for CodeBuild completion)
→ return 200 OK immediately
```
**Key differences from the CFN handler:**
- **No polling.** The webhook handler starts builds and returns immediately.
Build results are visible in CodeBuild logs and CloudWatch. There is no CFN
stack to signal.
- **No CFN resource.** The executor product is still available for manual use,
but webhook-triggered runs bypass Service Catalog entirely.
- **Idempotent.** If GHE retries the webhook (network blip), a duplicate build
is started. This is acceptable — `tf-run apply` on an already-applied state is
a no-op.
### Infrastructure changes
| Resource | Change |
|---|---|
| Lambda Function URL | Add `aws_lambda_function_url` resource in `deploy/lambda.tf` |
| Lambda invoke permission | Add `aws_lambda_permission` allowing `lambda:InvokeFunctionUrl` from `*` (HMAC signature is the auth mechanism) |
| Secrets Manager | Add a `ghe-runner/webhook-secret` secret for HMAC verification |
| Lambda IAM | No change — existing `codebuild:StartBuild` permission covers webhook-triggered builds |
| GHE Webhook | Manual one-time setup: org or per-repo webhook → Function URL, content-type `application/json`, events: `push` |
### `.sc-automation.yml` lifecycle
- **Proposer writes it** when it first creates the branch (if the file doesn't
exist yet). The Proposer knows `layer`, `region_dir`, and `target_account_id`
from its build environment variables. It commits `.sc-automation.yml` alongside
the rendered template files.
- **Platform engineers edit it** directly via PR if they need to add or remove
apply targets.
- **The file is idempotent** — subsequent Proposer runs `--force-with-lease` push
won't break it because the Proposer will only write the file if it doesn't
already exist (avoiding clobbering manual edits).
---
## Consequences
### Benefits
- Eliminates the manual "provision executor product" step after PR merge
- Apply is fully traceable: GHE push event → CloudWatch Logs → CodeBuild build ID
- No new infrastructure services (no EventBridge, no SQS, no API Gateway)
- The executor SC product remains available for manual one-off runs and
day-2 operations (re-run from a specific tag, dry-run, etc.)
### Trade-offs
- Build results are no longer surfaced in a CloudFormation stack output — users
must check CodeBuild or CloudWatch Logs directly
- GHE webhook requires a one-time manual setup per org (or per repo for
fine-grained control)
- A merge to `main` that does not involve Terraform changes (e.g. README edit)
will still trigger executor builds. Mitigation: add a `paths` filter in
`.sc-automation.yml` (future enhancement) or rely on `tf-run apply` being a
safe no-op
### Out of scope for this ADR
- Result notification (Slack, email) after a webhook-triggered apply — tracked
separately
- Path filtering (only trigger on changes under `{layer}/{region_dir}/`) —
tracked separately
---
## Alternatives Considered
**CodeStar connection + CodePipeline watch:** Requires CodePipeline infrastructure
per repo, CodeStar connector host setup for GHE on-prem, and loses the per-run
environment variable flexibility that the Lambda `StartBuild` override model
provides. Rejected.
**EventBridge + S3 source:** Would require mirroring the GHE repo to CodeCommit
or S3 to get an EventBridge trigger. Adds a sync layer with no benefit. Rejected.
**Poll-based apply (Lambda on schedule):** Adds latency and unnecessary API calls.
Rejected.

0 comments on commit c6ac447

Please sign in to comment.