From de3d3a7c6749abb3d69520cd617e7cad8a4af268 Mon Sep 17 00:00:00 2001 From: Dave Arnold Date: Wed, 6 May 2026 15:13:57 -0700 Subject: [PATCH] Update CHECKPOINT.md to reflect completion of implementation phases and detailed architecture --- design-docs/CHECKPOINT.md | 142 +++++++++++++++++++++++++++++++++++++- 1 file changed, 139 insertions(+), 3 deletions(-) diff --git a/design-docs/CHECKPOINT.md b/design-docs/CHECKPOINT.md index 3992da1..702551d 100644 --- a/design-docs/CHECKPOINT.md +++ b/design-docs/CHECKPOINT.md @@ -2,9 +2,145 @@ ## 1. Last Updated -**2026-04-28** — Architecture finalized (CodeBuild all-in-one runner, GHA deferred on OIDC blocker). -Design doc written. Checkpoint system created. Skill file description syntax fixed. -No implementation code written yet. +**2026-05-06** — Implementation complete: Phases 1–3 fully built and committed. + +--- + +## 2. Architecture (locked in) + +**Pipeline**: SC Console → CFN `Custom::TerraformRun` → Lambda → CodeBuild → Account Repo (`tf-run` + PR) + +- **Lambda** (`tf-run-executor-trigger`, csvd-dev `229685449397`, `us-gov-west-1`, 900s timeout): + validates inputs (Pydantic v2 `TfRunRequest`), fetches GHE PAT from Secrets Manager + (`ghe-runner/github-token`), starts CodeBuild with env-var overrides, polls every 20s, + signals CFN SUCCESS/FAILED with PR URL + repo URL + branch name. + +- **CodeBuild** (`tf-run-executor`, 60 min timeout, Amazon Linux 2): + installs Terraform from S3 + Census CA cert + tf-run toolchain from `scripts/` → + clones account repo over HTTPS → writes `EXTRA_FILES` → commits + pushes to `repo-init` → + `cd //` → `TFARGS=-auto-approve tf-run apply [tag:START_TAG]` + (or `tf-plan` if `DRY_RUN=true`) → opens PR via `gh` CLI → + emits `PR_URL=` in post_build for Lambda to parse. + +- **GHA**: deferred — blocked on OIDC. Buildspec designed to port directly to GHA workflow + with no Lambda changes. + +Full spec: `design-docs/README.md` + +--- + +## 3. Implementation Status + +### Phase 1 — CodeBuild + buildspec ✅ DONE + +| File | Status | +|------|--------| +| `buildspec.yml` | ✅ complete | +| `deploy/codebuild.tf` | ✅ complete | +| `deploy/iam.tf` (CodeBuild role) | ✅ complete | +| `deploy/variables.tf` | ✅ complete | +| `deploy/provider.tf` | ✅ complete | + +### Phase 2 — Lambda ✅ DONE + +| File | Status | +|------|--------| +| `lambda/app.py` | ✅ complete | +| `lambda/requirements.txt` | ✅ complete | +| `lambda/Dockerfile` | ✅ complete | +| `deploy/lambda.tf` | ✅ complete | +| `deploy/iam.tf` (Lambda role added) | ✅ complete | + +### Phase 3 — Service Catalog ✅ DONE + +| File | Status | +|------|--------| +| `service-catalog/product-template.yaml` | ✅ complete | +| `deploy/service_catalog.tf` | ✅ complete | + +### Phase 4 — Polish 🔲 NOT STARTED + +- CloudWatch dashboard +- SNS alert on FAILED builds +- GHA migration docs / path + +--- + +## 4. Key Decisions Made + +| Decision | Rationale | +|----------|-----------| +| CodeBuild as runner (not GHA) | OIDC blocked at Census; CodeBuild buildable from existing patterns | +| `tf-run` toolchain installed from `scripts/` in this repo | This repo is CodeBuild source; no separate S3 upload needed | +| `GITHUB_TOKEN` passed as `PLAINTEXT` (not `SECRETS_MANAGER` type) | Lambda fetches it from SM and injects per-build; avoids IAM permission complexity on CodeBuild role | +| `extra_files` `field_validator` in Pydantic model | CFN passes all parameters as strings; `"{}"` would fail dict validation without the parser | +| Snake_case in CFN `Properties` block | PascalCase normalizer mishandles acronyms (`AWSAccountId` → `a_w_s_account_id`); snake_case is passed through unchanged | +| `principal_org_id` on Lambda permission | Restricts cross-account CFN invocation to org members only | +| `lifecycle { ignore_changes = [image_uri] }` on Lambda | Prevents Terraform from rolling back image on every `tf apply` after image update | +| Physical resource ID = `{account_repo}-{layer}-{region_dir}` | Ensures idempotent CFN Updates don't re-run if nothing changed | + +--- + +## 5. Required Terraform Variables (not defaulted) + +```hcl +# deploy/terraform.tfvars +source_repo_url = "https://github.e.it.census.gov/SCT-Engineering/sc-lambda-ghactions" +artifacts_bucket_name = "csvd-sc-product-templates" +org_id = "o-" +``` + +--- + +## 6. Next Action + +None for implementation. Remaining work: +- **Manual end-to-end test**: SC provision in csvd-dev → CodeBuild → tf-run → PR → CFN SUCCESS +- **Lambda image build + push**: build `lambda/Dockerfile`, push to ECR `tf-run-executor/lambda:latest` +- **Phase 4**: CloudWatch dashboard + SNS alerts (low priority) +- **GHA migration**: once OIDC is unblocked, replace CodeBuild trigger with `repository_dispatch` + +--- + +## 7. Key File Index + +### Implementation files + +| File | Purpose | +|------|---------| +| `buildspec.yml` | CodeBuild build definition | +| `scripts/tf-run` | Bash tf-run orchestrator (v1.13.13) | +| `scripts/tf-control.sh` | tf-{action} wrapper (v1.11.0) | +| `scripts/tf-run.py` | Python port of tf-run (v2.0.0) | +| `scripts/tf-directory-setup.py` | remote_state.yml → backend.tf generator | +| `lambda/app.py` | Lambda handler | +| `lambda/Dockerfile` | Lambda container image | +| `service-catalog/product-template.yaml` | SC CFN product template | +| `deploy/*.tf` | All Terraform infrastructure | + +### Prompts (`.github/prompts/`) + +| File | Purpose | +|------|---------| +| `checkpoint-save.prompt.md` | End-of-session: rewrite CHECKPOINT.md | +| `checkpoint-load.prompt.md` | Start-of-session: restore context | +| `review-sc-template.prompt.md` | Review SC CFN templates for Census conventions | +| `new-account-repo-layer.prompt.md` | Scaffold a new layer in an account repo | +| `analyze-pr-comments.prompt.md` | Analyze PR review comments | + +### Agents (`.github/agents/`) + +| File | Purpose | +|------|---------| +| `planner.agent.md` | Planning / design agent | +| `implementation.agent.md` | Implementation agent | +| `reviewer.agent.md` | Review agent | + +### Skills (`.github/skills/`) + +| File | Purpose | +|------|---------| +| `account-repo-analysis/SKILL.md` | Account repo structure, tf-run DSL, remote_state.yml | ---