Skip to content

Commit

Permalink
Update CHECKPOINT.md to reflect completion of implementation phases a…
Browse files Browse the repository at this point in the history
…nd detailed architecture
  • Loading branch information
Dave Arnold committed May 6, 2026
1 parent 3911886 commit de3d3a7
Showing 1 changed file with 139 additions and 3 deletions.
142 changes: 139 additions & 3 deletions design-docs/CHECKPOINT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,145 @@

## 1. Last Updated

**2026-04-28** — Architecture finalized (CodeBuild all-in-one runner, GHA deferred on OIDC blocker).
Design doc written. Checkpoint system created. Skill file description syntax fixed.
No implementation code written yet.
**2026-05-06** — Implementation complete: Phases 1–3 fully built and committed.

---

## 2. Architecture (locked in)

**Pipeline**: SC Console → CFN `Custom::TerraformRun` → Lambda → CodeBuild → Account Repo (`tf-run` + PR)

- **Lambda** (`tf-run-executor-trigger`, csvd-dev `229685449397`, `us-gov-west-1`, 900s timeout):
validates inputs (Pydantic v2 `TfRunRequest`), fetches GHE PAT from Secrets Manager
(`ghe-runner/github-token`), starts CodeBuild with env-var overrides, polls every 20s,
signals CFN SUCCESS/FAILED with PR URL + repo URL + branch name.

- **CodeBuild** (`tf-run-executor`, 60 min timeout, Amazon Linux 2):
installs Terraform from S3 + Census CA cert + tf-run toolchain from `scripts/`
clones account repo over HTTPS → writes `EXTRA_FILES` → commits + pushes to `repo-init`
`cd <LAYER>/<REGION_DIR>/``TFARGS=-auto-approve tf-run apply [tag:START_TAG]`
(or `tf-plan` if `DRY_RUN=true`) → opens PR via `gh` CLI →
emits `PR_URL=<url>` in post_build for Lambda to parse.

- **GHA**: deferred — blocked on OIDC. Buildspec designed to port directly to GHA workflow
with no Lambda changes.

Full spec: `design-docs/README.md`

---

## 3. Implementation Status

### Phase 1 — CodeBuild + buildspec ✅ DONE

| File | Status |
|------|--------|
| `buildspec.yml` | ✅ complete |
| `deploy/codebuild.tf` | ✅ complete |
| `deploy/iam.tf` (CodeBuild role) | ✅ complete |
| `deploy/variables.tf` | ✅ complete |
| `deploy/provider.tf` | ✅ complete |

### Phase 2 — Lambda ✅ DONE

| File | Status |
|------|--------|
| `lambda/app.py` | ✅ complete |
| `lambda/requirements.txt` | ✅ complete |
| `lambda/Dockerfile` | ✅ complete |
| `deploy/lambda.tf` | ✅ complete |
| `deploy/iam.tf` (Lambda role added) | ✅ complete |

### Phase 3 — Service Catalog ✅ DONE

| File | Status |
|------|--------|
| `service-catalog/product-template.yaml` | ✅ complete |
| `deploy/service_catalog.tf` | ✅ complete |

### Phase 4 — Polish 🔲 NOT STARTED

- CloudWatch dashboard
- SNS alert on FAILED builds
- GHA migration docs / path

---

## 4. Key Decisions Made

| Decision | Rationale |
|----------|-----------|
| CodeBuild as runner (not GHA) | OIDC blocked at Census; CodeBuild buildable from existing patterns |
| `tf-run` toolchain installed from `scripts/` in this repo | This repo is CodeBuild source; no separate S3 upload needed |
| `GITHUB_TOKEN` passed as `PLAINTEXT` (not `SECRETS_MANAGER` type) | Lambda fetches it from SM and injects per-build; avoids IAM permission complexity on CodeBuild role |
| `extra_files` `field_validator` in Pydantic model | CFN passes all parameters as strings; `"{}"` would fail dict validation without the parser |
| Snake_case in CFN `Properties` block | PascalCase normalizer mishandles acronyms (`AWSAccountId``a_w_s_account_id`); snake_case is passed through unchanged |
| `principal_org_id` on Lambda permission | Restricts cross-account CFN invocation to org members only |
| `lifecycle { ignore_changes = [image_uri] }` on Lambda | Prevents Terraform from rolling back image on every `tf apply` after image update |
| Physical resource ID = `{account_repo}-{layer}-{region_dir}` | Ensures idempotent CFN Updates don't re-run if nothing changed |

---

## 5. Required Terraform Variables (not defaulted)

```hcl
# deploy/terraform.tfvars
source_repo_url = "https://github.e.it.census.gov/SCT-Engineering/sc-lambda-ghactions"
artifacts_bucket_name = "csvd-sc-product-templates"
org_id = "o-<your-org-id>"
```

---

## 6. Next Action

None for implementation. Remaining work:
- **Manual end-to-end test**: SC provision in csvd-dev → CodeBuild → tf-run → PR → CFN SUCCESS
- **Lambda image build + push**: build `lambda/Dockerfile`, push to ECR `tf-run-executor/lambda:latest`
- **Phase 4**: CloudWatch dashboard + SNS alerts (low priority)
- **GHA migration**: once OIDC is unblocked, replace CodeBuild trigger with `repository_dispatch`

---

## 7. Key File Index

### Implementation files

| File | Purpose |
|------|---------|
| `buildspec.yml` | CodeBuild build definition |
| `scripts/tf-run` | Bash tf-run orchestrator (v1.13.13) |
| `scripts/tf-control.sh` | tf-{action} wrapper (v1.11.0) |
| `scripts/tf-run.py` | Python port of tf-run (v2.0.0) |
| `scripts/tf-directory-setup.py` | remote_state.yml → backend.tf generator |
| `lambda/app.py` | Lambda handler |
| `lambda/Dockerfile` | Lambda container image |
| `service-catalog/product-template.yaml` | SC CFN product template |
| `deploy/*.tf` | All Terraform infrastructure |

### Prompts (`.github/prompts/`)

| File | Purpose |
|------|---------|
| `checkpoint-save.prompt.md` | End-of-session: rewrite CHECKPOINT.md |
| `checkpoint-load.prompt.md` | Start-of-session: restore context |
| `review-sc-template.prompt.md` | Review SC CFN templates for Census conventions |
| `new-account-repo-layer.prompt.md` | Scaffold a new layer in an account repo |
| `analyze-pr-comments.prompt.md` | Analyze PR review comments |

### Agents (`.github/agents/`)

| File | Purpose |
|------|---------|
| `planner.agent.md` | Planning / design agent |
| `implementation.agent.md` | Implementation agent |
| `reviewer.agent.md` | Review agent |

### Skills (`.github/skills/`)

| File | Purpose |
|------|---------|
| `account-repo-analysis/SKILL.md` | Account repo structure, tf-run DSL, remote_state.yml |

---

Expand Down

0 comments on commit de3d3a7

Please sign in to comment.