Skip to content

Commit

Permalink
docs: generalized architecture, webhook auto-apply ADR, Vault ADR
Browse files Browse the repository at this point in the history
- docs/README.md: high-level index with reading paths by use case
- docs/HOW-IT-WORKS.md: reframe from two-product to single Proposer +
  webhook auto-apply; remove executor SC product framing
- docs/decisions/001-webhook-auto-apply.md: status Proposed → Accepted;
  update context and consequences to reflect removal of executor SC product
- docs/decisions/002-vault-aws-secrets-engine.md: new ADR for Vault AWS
  Secrets Engine; dynamic cross-account credentials; per-product IAM scope
  via Proposer terraform apply; account baseline prerequisite pattern
- docs/generalized-terraform-product-architecture.md: new
- docs/template-management.md: Executor flow, .sc-automation.yml schema
- docs/repo-vars-and-secrets.md: CodeBuild environmentVariablesOverride pattern
- docs/workflow-flowcharts.md: Mermaid diagrams for propose/apply flows
- docs/fleet-governance-at-scale.md: new
- docs/service-catalog-census-integration.md: new
- docs/cross-account-visibility.md: new
  • Loading branch information
Dave Arnold committed May 19, 2026
1 parent b67df30 commit 7a537eb
Show file tree
Hide file tree
Showing 11 changed files with 2,560 additions and 74 deletions.
136 changes: 78 additions & 58 deletions docs/HOW-IT-WORKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,29 +6,34 @@ to a Terraform plan or apply running inside an AWS account repository.

---

## Design Overview: Two-Product Model
## Design Overview: Proposer Product + Webhook Auto-Apply

The system is split into **two distinct Service Catalog products** with a human
review gate between them:
The system uses a **single user-facing Service Catalog product** with a human
review gate before Terraform runs any infrastructure changes:

| Product | CodeBuild Project | What It Does |
|---------|------------------|--------------|
| `tf-run-proposer` | `tf-run-proposer` | Clone repo → render templates → commit → open PR |
| `tf-run-executor` | `tf-run-executor` | Clone `main` → assume role → run `tf-run apply` |
| Component | CodeBuild Project | What It Does |
|-----------|------------------|--------------|
| SC Product: `tf-run-proposer` | `tf-run-proposer` | Clone repo → render templates → commit → open PR |
| Webhook (automatic) | `tf-run-executor` | Clone `main` → assume role → run `tf-run apply` |

**Why two products?**
**Why not two SC products?**

An earlier single-product design ran `tf-run apply` first and then opened a PR
as a trailing artifact. This made the PR meaningless as a review gate — Terraform
had already changed real infrastructure before anyone saw the diff.
An earlier design exposed the executor as a second Service Catalog product,
requiring a human to return to the SC console after merging the PR, re-enter the
same parameters, and click Launch. This is pure operational overhead — the review
already happened at PR merge time, and the parameters needed to run the apply are
already recorded in `.sc-automation.yml` in the repo.

The two-product model restores the PR as a genuine gate:
The current design restores the PR as a genuine gate with no extra manual steps:

1. A team provisions the **Proposer** → changes are committed to a branch and a PR
is opened. No infrastructure is touched. CFN stack completes quickly (< 60s).
1. A team provisions the **Proposer** product → changes are committed to a branch
and a PR is opened. No infrastructure is touched.
2. A human reviews the diff, approves, and merges the PR.
3. The team provisions the **Executor** → CodeBuild checks out `main` (post-merge),
assumes the target account role, and runs `tf-run apply`.
3. The GHE push-to-main webhook fires automatically → Lambda reads
`.sc-automation.yml` → starts `tf-run-executor` CodeBuild. No SC product,
no CFN stack, no user action required.

See [ADR-001](decisions/001-webhook-auto-apply.md) for the full decision record.

---

Expand Down Expand Up @@ -58,22 +63,21 @@ The two-product model restores the PR as a genuine gate:
↕ Human reviews PR, approves, merges ↕
┌─────────────────────────────────────────────────────────────────────┐
│ APPLY FLOW
AUTO-APPLY (webhook — no user action required)
│ │
│ User fills SC form → CFN Custom Resource │
│ └─> Lambda (tf-run-executor-trigger) │
│ • Validates inputs (action=apply) │
│ • Starts tf-run-executor CodeBuild build │
│ • Polls CodeBuild until completion │
│ • Returns apply status + repo URL to CFN │
│ GHE push to main → Lambda Function URL (HMAC verified) │
│ └─> Lambda (tf-run-webhook-handler) │
│ • Reads .sc-automation.yml from default branch │
│ • Starts tf-run-executor CodeBuild (fire-and-forget) │
│ • Posts pending commit status to GHE │
│ └─> CodeBuild: tf-run-executor │
│ • Installs: Terraform binary (from S3), tf-run │
│ toolchain, Census CA cert, gh CLI, Python deps │
│ • Clones account repo at main (post-merge) │
│ • Optionally assumes cross-account IAM role │
│ • cd {LAYER}/{REGION_DIR} │
│ • tf-run apply (respects TF_RUN_START_TAG) │
│ • POST_BUILD emits BUILD_RESULT=
│ • POST_BUILD writes commit status ✅/❌ to GHE
└─────────────────────────────────────────────────────────────────────┘
```

Expand All @@ -88,9 +92,9 @@ The two-product model restores the PR as a genuine gate:
| CodeBuild (executor) | `tf-run-executor` | csvd-dev |
| SC Portfolio | `{prefix}-tf-run` | csvd-dev |
| SC Product (propose) | `{prefix}-tf-run-proposer` | csvd-dev |
| SC Product (apply) | `{prefix}-tf-run-executor` | csvd-dev |
| CFN Template (propose) | `service-catalog/proposer-template.yaml` | S3 artifacts bucket |
| CFN Template (apply) | `service-catalog/executor-template.yaml` | S3 artifacts bucket |
| Lambda Function URL | `tf-run-webhook-handler` HTTPS endpoint | csvd-dev |
| GHE Webhook | Org-level push webhook → Lambda Function URL | GHE (manual one-time setup) |
| Launch Role | `{prefix}-sc-launch-role` | csvd-dev |
| GHE PAT | `ghe-runner/github-token` in Secrets Manager | csvd-dev |
| Cross-account role | `sc-automation-codebuild-role` | **Target** account |
Expand Down Expand Up @@ -166,38 +170,37 @@ The CFN stack completes and the output panel shows the PR URL.

---

## Step-by-Step: Apply Flow
## Auto-Apply on Merge (Webhook)

### 1. Prerequisites

- The Proposer has run and its PR has been **reviewed and merged** to `main`
- `.sc-automation.yml` was committed by the Proposer alongside the rendered files
- The target account has the `sc-automation-codebuild-role` IAM role with a trust
policy allowing assume-role from the CodeBuild execution role in csvd-dev
- The GHE org webhook is configured once: push events → Lambda Function URL

### 2. User fills the SC form

The user opens the **tf-run-executor** product and provides:

- **AccountRepo** — same repo name as the Proposer
- **Layer** and **RegionDir** — same as the Proposer
- **TargetAccountId** _(optional)_ — if set, CodeBuild assumes the cross-account role
- **TfRunStartTag** _(optional)_ — start tf-run from a specific `TAG` step
- **DryRun**`true` for plan-only, `false` to apply

### 3. CloudFormation invokes the Lambda
### 2. GHE fires the push webhook

CFN creates a `Custom::TerraformApply` resource with `action: apply`.
On merge to `main`, GHE sends a `push` event to the Lambda Function URL with
an HMAC-SHA256 signature (`X-Hub-Signature-256` header). The Lambda verifies
the signature against the `ghe-runner/webhook-secret` Secrets Manager secret.

### 4. Lambda validates and starts CodeBuild
### 3. Lambda reads `.sc-automation.yml` and starts CodeBuild

Lambda starts `tf-run-executor` with:
The Lambda (webhook handler mode):
1. Fetches `.sc-automation.yml` from the default branch of the pushed repo
2. Extracts `account_repo`, `layer`, `region_dir`, `target_account_id`,
`dry_run`, and optional `tf_run_start_tag`
3. Calls `codebuild:StartBuild` on `tf-run-executor` with override env vars:
```
ACCOUNT_REPO, LAYER, REGION_DIR,
TARGET_ACCOUNT_ID, TF_RUN_START_TAG, DRY_RUN, GITHUB_TOKEN
```
4. Posts a `pending` commit status to the merge commit on GHE
5. Returns HTTP 200 immediately — the webhook call is fire-and-forget

```
ACCOUNT_REPO, LAYER, REGION_DIR,
TARGET_ACCOUNT_ID, TF_RUN_START_TAG, DRY_RUN, GITHUB_TOKEN
```

### 5. CodeBuild - INSTALL phase
### 4. CodeBuild - INSTALL phase

- Clones `github.e.it.census.gov/terraform/support` for version governance
- Downloads Terraform binary from S3 (version governed by `VERSION_TF`)
Expand All @@ -206,22 +209,39 @@ TARGET_ACCOUNT_ID, TF_RUN_START_TAG, DRY_RUN, GITHUB_TOKEN
- Downloads and installs `gh` CLI
- `pip3 install python-dateutil pyyaml`

### 6. CodeBuild - BUILD phase
### 5. CodeBuild - BUILD phase

1. Rewrite git remotes; `git clone` account repo; `git checkout main`
2. If `TARGET_ACCOUNT_ID` is set: `aws sts assume-role`
`arn:aws:iam::{TARGET_ACCOUNT_ID}:role/sc-automation-codebuild-role`
`arn:${AWS::Partition}:iam::{TARGET_ACCOUNT_ID}:role/sc-automation-codebuild-role`
and export the temporary credentials
3. `cd ${LAYER}/${REGION_DIR}`
4. If `DRY_RUN=true`: `tf-run plan`; else: `tf-run apply` (with optional `--start-tag ${TF_RUN_START_TAG}`)
4. If `DRY_RUN=true`: `tf-run plan`; else: `tf-run apply`
(with optional `--start-tag ${TF_RUN_START_TAG}`)

### 7. Lambda polls and returns
### 6. CodeBuild - POST_BUILD phase

On `SUCCEEDED`:
- Sends CFN `SUCCESS` with:
- `ApplyStatus: SUCCEEDED`
- `RepositoryUrl` / `repository_url`
- `CodeBuildBuildId`
Writes a `success` or `failure` commit status to GHE on the merge commit,
linking to the CodeBuild log. Platform engineers see ✅/❌ on the commit
without checking CloudWatch directly.

### Manual One-Off Runs

For re-apply, dry-run, or partial runs (start from a TAG), trigger the executor
build directly:

```bash
export AWS_DEFAULT_REGION=us-gov-west-1
aws codebuild start-build \
--project-name tf-run-executor \
--environment-variables-override \
name=ACCOUNT_REPO,value=229685449397-csvd-dev-platform-dev-gov,type=PLAINTEXT \
name=LAYER,value=infrastructure,type=PLAINTEXT \
name=REGION_DIR,value=west,type=PLAINTEXT \
name=DRY_RUN,value=true,type=PLAINTEXT
```

No Service Catalog product is needed.

---

Expand Down Expand Up @@ -310,6 +330,6 @@ mishandles acronyms (`AWSAccountId` → `a_w_s_account_id`).
| `deploy/codebuild.tf` | Terraform: `aws_codebuild_project.tf_run_proposer` + `tf_run_executor` |
| `deploy/lambda.tf` | Terraform: Lambda function with `PROPOSER_PROJECT_NAME` + `EXECUTOR_PROJECT_NAME` |
| `deploy/iam.tf` | Terraform: IAM roles for Lambda, CodeBuild (with `sts:AssumeRole`), SC launch |
| `deploy/service_catalog.tf` | Terraform: Portfolio, two products, two launch constraints |
| `deploy/service_catalog.tf` | Terraform: Portfolio, single Proposer product, launch constraint |
| `deploy/webhook.tf` | Terraform: Lambda Function URL, HMAC secret, GHE webhook IAM |
| `service-catalog/proposer-template.yaml` | CFN template for the Propose product |
| `service-catalog/executor-template.yaml` | CFN template for the Apply product |
145 changes: 145 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# sc-lambda-ghactions Documentation

This directory contains the design, operating model, and rollout guidance for
`sc-lambda-ghactions` — the centralized Lambda + CodeBuild system that provisions
and manages Terraform-backed account repo changes through AWS Service Catalog.

## What This System Does

At a high level, the platform supports this workflow:

1. A user launches a Service Catalog product
2. CloudFormation invokes a centralized Lambda in `csvd-dev`
3. The Lambda validates inputs and starts a CodeBuild build
4. CodeBuild clones a template repo, renders Terraform/HCL/YAML content, and opens a PR
5. After merge, the executor path can run Terraform against the target workload
6. CSVD can also operate the full managed fleet centrally

## How to Read This Documentation

This doc set currently contains both:

- **Current or near-term implementation guidance** for the CodeBuild-based rollout
- **Proposed design evolution** for auto-apply, generalized product types, and fleet-scale operations

Because of that, the best entry point depends on what you need.

## Recommended Reading Paths

### 1. "I need the quickest overview"

Start with:

- [HOW-IT-WORKS.md](HOW-IT-WORKS.md) — end-to-end explanation of the proposer/executor model, the main infrastructure components, and the current CodeBuild execution flow
- [workflow-flowcharts.md](workflow-flowcharts.md) — visual walkthrough of provisioning, apply-on-merge, and fleet update flows

### 2. "I need to understand the target generalized architecture"

Start with:

- [generalized-terraform-product-architecture.md](generalized-terraform-product-architecture.md) — explains how the system expands from EKS-only into a reusable pattern for any Terraform workload
- [template-management.md](template-management.md) — explains how template repos, Jinja2 rendering, `.sc-automation.yml`, and repo injection work
- [repo-vars-and-secrets.md](repo-vars-and-secrets.md) — explains how SSM and Secrets Manager values are injected into CodeBuild builds

### 3. "I need to onboard a new Service Catalog product"

Read in this order:

- [generalized-terraform-product-architecture.md](generalized-terraform-product-architecture.md) — required moving parts for a new `product_type`
- [template-management.md](template-management.md) — template repo structure and rendering expectations
- [service-catalog-census-integration.md](service-catalog-census-integration.md) — how to register the product in `terraform-service-catalog-census`
- [repo-vars-and-secrets.md](repo-vars-and-secrets.md) — how product-scoped configuration and secrets reach the build

### 4. "I need to understand operations and governance at scale"

Start with:

- [fleet-governance-at-scale.md](fleet-governance-at-scale.md) — the `terraform-sc-fleet` operating model, workload inventory structure, maintenance windows, and governance controls
- [cross-account-visibility.md](cross-account-visibility.md) — hub-and-spoke IAM model and options for centralized visibility across accounts
- [workflow-flowcharts.md](workflow-flowcharts.md) — visual summary of fleet-wide operations

### 5. "I need to understand the webhook auto-apply proposal"

Read:

- [decisions/001-webhook-auto-apply.md](decisions/001-webhook-auto-apply.md) — ADR for triggering executor builds automatically from GitHub Enterprise webhook events
- [workflow-flowcharts.md](workflow-flowcharts.md) — flow-level view of the apply-on-merge path
- [template-management.md](template-management.md)`.sc-automation.yml` schema and executor behavior

## Document Guide

### Core system overview

- [HOW-IT-WORKS.md](HOW-IT-WORKS.md)
- Best for understanding the end-to-end proposer/executor model
- Covers the centralized Lambda, CodeBuild projects, SC products, and step-by-step runtime behavior
- Use this as the main operational baseline

- [workflow-flowcharts.md](workflow-flowcharts.md)
- Best for stakeholder demos and quick architectural orientation
- Includes flows for provisioning, apply-on-merge, and fleet-wide updates

### Generalization and product onboarding

- [generalized-terraform-product-architecture.md](generalized-terraform-product-architecture.md)
- Explains how the platform generalizes to any Terraform workload
- Defines the core onboarding units: template repo, Jinja2 templates, Pydantic model, CFN product template, census registration

- [template-management.md](template-management.md)
- Canonical guide for template repo usage
- Covers full-repo vs subdirectory templates, Jinja2 rendering, `.sc-automation.yml`, proposer behavior, and executor re-rendering into existing account repos

- [repo-vars-and-secrets.md](repo-vars-and-secrets.md)
- Canonical guide for runtime config injection
- Covers AWS Parameter Store layout, Secrets Manager layout, Lambda IAM, and CodeBuild `environmentVariablesOverride`

- [service-catalog-census-integration.md](service-catalog-census-integration.md)
- Canonical guide for enterprise product registration
- Covers central vs StackSet vs census-managed resources, launch roles, portfolio/product YAML, and rollout into `terraform-service-catalog-census`

### Operations, governance, and visibility

- [fleet-governance-at-scale.md](fleet-governance-at-scale.md)
- Defines the `terraform-sc-fleet` model for operating many workloads across many repos
- Covers workload entry files, account repo layout, update scripts, maintenance windows, CODEOWNERS, and branch protection

- [cross-account-visibility.md](cross-account-visibility.md)
- Covers read-only access patterns for viewing managed resources across accounts
- Describes the hub-and-spoke IAM role chain and Resource Explorer-first UI approach

### Architecture decisions

- [decisions/001-webhook-auto-apply.md](decisions/001-webhook-auto-apply.md)
- ADR for the proposed webhook-triggered executor path
- Useful for understanding why the manual post-merge step should disappear and how `.sc-automation.yml` participates in the design

## Suggested Canonical Interpretation

Where multiple docs overlap, use this interpretation:

- [HOW-IT-WORKS.md](HOW-IT-WORKS.md) is the best **runtime/system overview**
- [template-management.md](template-management.md) is the best **template repo and account repo injection** reference
- [repo-vars-and-secrets.md](repo-vars-and-secrets.md) is the best **config/secrets injection** reference
- [service-catalog-census-integration.md](service-catalog-census-integration.md) is the best **enterprise rollout** reference
- [fleet-governance-at-scale.md](fleet-governance-at-scale.md) is the best **day-2 fleet operations** reference
- [decisions/001-webhook-auto-apply.md](decisions/001-webhook-auto-apply.md) is the best **design rationale** for auto-apply on merge

## Current Gaps and Notes

This doc set is now broad enough to explain:

- how template repos are leveraged
- how rendered content is injected into new and existing account repos
- how CodeBuild receives configuration and secrets
- how new products are registered in Census
- how CSVD governs and operates the resulting fleet

A few documents are still explicitly marked **Proposed** or **Draft**, so treat them as design intent unless and until the code and deployment match them.

## If You Only Read Three Docs

Read these first:

1. [HOW-IT-WORKS.md](HOW-IT-WORKS.md)
2. [template-management.md](template-management.md)
3. [service-catalog-census-integration.md](service-catalog-census-integration.md)
Loading

0 comments on commit 7a537eb

Please sign in to comment.