Skip to content

docs: generalized architecture, ADR-001 accepted, ADR-002 Vault AWS Secrets Engine #1

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
0ca30d8
Add template repo rendering, Bedrock discussion notes, and HOW-IT-WOR…
May 11, 2026
3ab2abe
Incorporate morga471 review feedback and address Teams follow-up ques…
May 11, 2026
42bd370
docs: fix outdated sections in HOW-IT-WORKS.md
May 11, 2026
28aa948
review: address morga471 round-2 feedback
May 11, 2026
8cfa5f5
fix: source tf-run toolchain from terraform/support, not scripts/
May 11, 2026
87c0926
feat: Enhance Terraform execution with proposer and executor templates
May 11, 2026
97921ff
feat: Add initial buildspec.yml for tf-run-executor configuration
May 11, 2026
c6ac447
docs: ADR-001 webhook auto-apply on merge to main (proposed)
May 11, 2026
8ca7f30
docs: ADR-001 update — webhook payload details, commit status writeba…
May 11, 2026
b67df30
docs: Update ADR-001 to clarify webhook-triggered auto-apply process
May 11, 2026
7a537eb
docs: generalized architecture, webhook auto-apply ADR, Vault ADR
May 19, 2026
9cc46c6
docs: remove stale GHA executor references
May 20, 2026
dc71d57
docs: add CodeBuild Projects Reference section to HOW-IT-WORKS
May 20, 2026
77f9a49
feat: Proposer generates all workspace files (REMOTE-STATE + tf-direc…
May 20, 2026
da7cc4e
feat: executor commit-back; terraform_latest; plugin cache; lock file…
May 20, 2026
ca51931
docs: template repos are delta overlays, not full account repo scaffolds
May 20, 2026
a16101c
feat: flat template repos; Proposer injects into LAYER/REGION_DIR
May 20, 2026
6728094
docs: update generalized architecture to reflect flat delta-overlay t…
May 20, 2026
7f32318
docs: handler.py lives in template repo; Lambda fetches at runtime
May 20, 2026
86f549b
docs: SC product registration via Terraform for_each, not manual steps
May 20, 2026
4f67dd8
docs: deploy_products/ workspace replaces census pipeline dependency;…
May 20, 2026
8606feb
docs: version pinning via template_repo_ref; SemVer tagging for templ…
May 20, 2026
336248d
docs: fix stale GHA architecture references; align executor docs with…
May 20, 2026
b14a084
adding supporting documentation for AWS Account Bootstrapping and how…
May 22, 2026
66838cf
docs: update CHECKPOINT with Jira sub-tasks and add decision document…
May 28, 2026
3834d9e
feat: add CROSS_ACCOUNT_ROLE for Vault-based cross-account credential…
Jun 2, 2026
4b32072
docs: add Vault AWS Secrets Engine sales presentation
Jun 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 27 additions & 15 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,34 @@ system at Census. The architecture is:
```
SC Console (user fills product form)
└─> CFN Stack (Custom::* resource)
└─> Lambda (centralized in csvd-dev, 229685449397, us-gov-west-1)
└─> Lambda tf-run-executor-trigger (csvd-dev, 229685449397, us-gov-west-1)
├─> Validates inputs (Pydantic v2 models)
├─> Fetches GHE token from Secrets Manager
├─> POSTs repository_dispatch to target repo on GHE
└─> Polls GHA workflow run → returns repo URL + PR URL to CFN
GitHub Enterprise (github.e.it.census.gov)
└─> GHA workflow (repository_dispatch event)
├─> Clones the target account repo
├─> Renders HCL/YAML files from templates
└─> Commits + opens PR (repo-init → main)
├─> Starts CodeBuild: tf-run-proposer
└─> Polls CodeBuild → returns PR URL + repo URL to CFN
CodeBuild: tf-run-proposer (csvd-dev)
└─> Clones target account repo
└─> Renders HCL/YAML files from template repo (Jinja2)
└─> Commits rendered files → opens PR (propose/sc-automation → main)
↕ Human reviews diff and merges PR ↕
GHE push webhook → Lambda tf-run-webhook-handler
└─> Reads .sc-automation.yml from default branch
└─> Starts CodeBuild: tf-run-executor (fire-and-forget)
CodeBuild: tf-run-executor (csvd-dev)
└─> Clones account repo at main (post-merge state)
└─> Optionally assumes cross-account IAM role (sc-automation-codebuild-role)
└─> Runs tf-run apply in LAYER/REGION_DIR
└─> Commits post-apply changes (lock file, remote_state symlinks) to main [skip ci]
└─> Writes ✅/❌ commit status to GHE
```

This replaces the current CodeBuild + terraform-eks-deployment path with a
GHA-native approach that keeps workflow logic inside the target repos.
This replaces the current CodeBuild + terraform-eks-deployment path.
Workflow logic lives in `buildspec-proposer.yml` and `buildspec-executor.yml`
in this repo; product-type-specific logic lives in `handler.py` in each template repo.

---

Expand Down Expand Up @@ -154,7 +167,7 @@ scripts in `/apps/terraform/bin/`. Key behavior:
- `aws_account_id` and `aws_region` are auto-resolved via `!Sub` in CFN;
do NOT add them as user-facing SC form parameters
- Lambda ServiceToken: `arn:${AWS::Partition}:lambda:${AWS::Region}:${AWS::AccountId}:function:{name}`
- Lambda timeout must be ≥ CodeBuild/GHA poll window (currently 900s)
- Lambda timeout must be ≥ CodeBuild poll window (currently 900s)

---

Expand All @@ -165,8 +178,7 @@ scripts in `/apps/terraform/bin/`. Key behavior:
- ❌ Do not write temp files to `/tmp` — use `~/tmp`
- ❌ Do not use `terraform` directly — use `tf` alias (`tf plan`, `tf apply`)
- ❌ Do not run AWS CLI/boto3 without `export AWS_DEFAULT_REGION=us-gov-west-1`
- ❌ Do not add `vpc_id` — field is `vpc_name`
- ❌ Do not use `HappyPathway/terraform-github-repo` public module
- ✅ DO use `CSVD/terraform-github-repo` (https://github.e.it.census.gov/CSVD/terraform-github-repo)
- ✅ DO use `gh` CLI for PR management
- ✅ DO use `GH_HOST=github.e.it.census.gov` for all GHE commands
- ✅ Cross-account role name is `sc-automation-codebuild-role` — must exist in every target
account and trust the CodeBuild IAM role from csvd-dev before the first executor run
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Packer pipeline zip files
tf-run-executor-builder.zip

# Terraform state and local overrides
*.tfstate
*.tfstate.backup
*.tfvars
.terraform/
.terraform.lock.hcl
.terraform_commits
terraform_data_dirs/
varfiles/
185 changes: 185 additions & 0 deletions buildspec-executor.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
version: 0.2

# ---------------------------------------------------------------------------
# tf-run-executor buildspec
#
# Purpose: clone account repo main branch, optionally assume a cross-account
# IAM role, and run tf-run apply in the target layer/region directory.
# This is triggered AFTER a proposer PR has been reviewed and merged.
# It does not render templates or open a PR.
#
# Required env-var overrides per build (supplied by Lambda):
# ACCOUNT_REPO - account repo name, e.g. 229685449397-csvd-dev-platform-dev-gov
# LAYER - terraform layer: common | infrastructure | vpc
# REGION_DIR - region directory: east | west | global
# GITHUB_TOKEN - GHE PAT (PLAINTEXT, value from Secrets Manager)
#
# Optional env-var overrides:
# TARGET_ACCOUNT_ID - AWS account ID to assume the cross-account role in
# (default: empty = run with CodeBuild role, csvd-dev only)
# CROSS_ACCOUNT_ROLE - IAM role name to assume in TARGET_ACCOUNT_ID
# (default: r-inf-terraform)
# TF_RUN_START_TAG - tf-run.data TAG label to start from (default: empty = from top)
# DRY_RUN - "true" = tf-run plan only, no apply (default: "false")
# ---------------------------------------------------------------------------

env:
variables:
GITHUB_ORG: "SCT-Engineering"
TF_BINARY_S3_PREFIX: "s3://csvd-packer-pipeline-assets/terraform"
GH_CLI_S3_PREFIX: "s3://csvd-packer-pipeline-assets/tools"
CENSUS_CA_S3: "s3://csvd-packer-pipeline-assets/certs/census-ca.pem"
TERRAFORM_SUPPORT_REPO: "terraform/support"
HTTPS_PROXY: "http://proxy.tco.census.gov:3128"
NO_PROXY: "github.e.it.census.gov,169.254.169.254,169.254.170.2"
# Per-build defaults (overridden via environmentVariablesOverride in Lambda)
TARGET_ACCOUNT_ID: ""
CROSS_ACCOUNT_ROLE: "r-inf-terraform"
TF_RUN_START_TAG: ""
DRY_RUN: "false"

phases:
install:
commands:
# --- Version governance: clone terraform/support to read org-canonical versions ---
- git clone --depth 1 "https://${GITHUB_TOKEN}@github.e.it.census.gov/${TERRAFORM_SUPPORT_REPO}.git" /tmp/tf-support
- export TF_VERSION=$(cat /tmp/tf-support/terraform/VERSION)
- export GH_VERSION=$(cat /tmp/tf-support/github-cli-releases/VERSION)
- echo "Using Terraform ${TF_VERSION}, gh CLI ${GH_VERSION}"

# --- Terraform binary (registry.terraform.io is blocked on Census network; use S3) ---
- aws s3 cp "${TF_BINARY_S3_PREFIX}/terraform_${TF_VERSION}_linux_amd64.zip" /tmp/terraform.zip
- unzip -o /tmp/terraform.zip -d /usr/local/bin/ && chmod +x /usr/local/bin/terraform
- ln -sf /usr/local/bin/terraform /usr/local/bin/tf

# --- Census CA certificate (required for TLS to github.e.it.census.gov) ---
- aws s3 cp "$CENSUS_CA_S3" /etc/pki/ca-trust/source/anchors/census-ca.pem
- update-ca-trust extract

# --- tf-run toolchain (sourced from terraform/support, already cloned above) ---
# Canonical versions live in terraform/support local-app/ — no copies kept in this repo.
- cp /tmp/tf-support/local-app/tf-run/tf-run.sh /usr/local/bin/tf-run
- cp /tmp/tf-support/local-app/tf-control/tf-control.sh /usr/local/bin/tf-control.sh
- cp /tmp/tf-support/local-app/tf-directory-setup/tf-directory-setup.py /usr/local/bin/tf-directory-setup.py
- chmod +x /usr/local/bin/tf-run /usr/local/bin/tf-control.sh /usr/local/bin/tf-directory-setup.py
# Create tf-{action} symlinks expected by tf-run and account repo steps
- >
for action in init plan apply destroy refresh output validate import state fmt taint console; do
ln -sf /usr/local/bin/tf-control.sh /usr/local/bin/tf-${action};
done
# Account repo .tf-control files set TFCOMMAND=terraform_latest (the Census workstation alias).
# In CodeBuild the binary is just 'terraform'; create the alias so tf-control.sh resolves it.
- ln -sf /usr/local/bin/terraform /usr/local/bin/terraform_latest

# --- Plugin cache directory (referenced by .tf-control.tfrc in every account repo) ---
# .tf-control.tfrc sets plugin_cache_dir = "/data/terraform/terraform.d/plugin-cache"
# and filesystem_mirror path = "/data/terraform/terraform.d/providers".
# Create both so Terraform does not error on init; the mirror is empty so Terraform
# falls through to the 'direct' block in the tfrc (via Census proxy to registry.terraform.io).
- mkdir -p /data/terraform/terraform.d/plugin-cache /data/terraform/terraform.d/providers

# --- Python deps for tf-directory-setup.py ---
- pip3 install --quiet python-dateutil pyyaml

# --- gh CLI (from S3, for any post-apply verification steps) ---
- aws s3 cp "${GH_CLI_S3_PREFIX}/gh_${GH_VERSION}_linux_amd64.tar.gz" /tmp/gh.tar.gz
- mkdir -p /tmp/gh-cli && tar -xzf /tmp/gh.tar.gz -C /tmp/gh-cli --strip-components=1
- cp /tmp/gh-cli/bin/gh /usr/local/bin/gh && chmod +x /usr/local/bin/gh

build:
commands:
# --- Configure git to rewrite SSH URLs to HTTPS ---
# Module sources in account repos use ssh://git@github.e.it.census.gov/... or git@...
# This rewrite transparently redirects them to HTTPS + PAT at the git layer.
- git config --global url."https://${GITHUB_TOKEN}@github.e.it.census.gov/".insteadOf "ssh://git@github.e.it.census.gov/"
- git config --global url."https://${GITHUB_TOKEN}@github.e.it.census.gov/".insteadOf "git@github.e.it.census.gov:"

# --- Clone account repo from main (the reviewed + merged state) ---
- git clone "https://${GITHUB_TOKEN}@github.e.it.census.gov/${GITHUB_ORG}/${ACCOUNT_REPO}.git" repo
- cd repo
# Verify we are on main (not a work branch)
- git checkout main
- echo "Applying from $(git rev-parse --short HEAD) on main"

# --- Assume cross-account role (if TARGET_ACCOUNT_ID is set) ---
# The role (default: r-inf-terraform) must exist in the target account and
# trust arn:...:iam::229685449397:role/tf-run-executor-codebuild.
# Override CROSS_ACCOUNT_ROLE per-build to use a different role name.
- |
if [ -n "${TARGET_ACCOUNT_ID}" ]; then
PARTITION=$(aws sts get-caller-identity --query Arn --output text | cut -d: -f2)
ROLE_ARN="arn:${PARTITION}:iam::${TARGET_ACCOUNT_ID}:role/${CROSS_ACCOUNT_ROLE}"
echo "Assuming cross-account role: ${ROLE_ARN}"
CREDS=$(aws sts assume-role \
--role-arn "${ROLE_ARN}" \
--role-session-name "sc-automation-${ACCOUNT_REPO}" \
--query Credentials \
--output json)
export AWS_ACCESS_KEY_ID=$(echo "$CREDS" | python3 -c "import json,sys; print(json.load(sys.stdin)['AccessKeyId'])")
export AWS_SECRET_ACCESS_KEY=$(echo "$CREDS" | python3 -c "import json,sys; print(json.load(sys.stdin)['SecretAccessKey'])")
export AWS_SESSION_TOKEN=$(echo "$CREDS" | python3 -c "import json,sys; print(json.load(sys.stdin)['SessionToken'])")
echo "Assumed role in account ${TARGET_ACCOUNT_ID}"
else
echo "No TARGET_ACCOUNT_ID set — running with CodeBuild role (csvd-dev only)"
fi

# --- Run Terraform in target layer/region directory ---
# tf-run auto-proceeds on non-TTY stdin (read -t timeout defaults to "y")
#
# NOTE on file-generating tf-run.data directives:
# REMOTE-STATE — generates workspace remote_state.yml from parent
# COMMAND tf-directory-setup.py — generates remote_state.backend.tf + variant files
# The Proposer already ran both of these and committed the results in the PR.
# When tf-run hits these steps here they are idempotent: they overwrite files
# that already exist with identical content. No new files are created at apply time.
#
# NOTE on logs/: tf-control.sh writes every plan/apply to logs/{action}.{timestamp}.log.
# This directory is ephemeral (never committed). Ensure logs/ is in .gitignore.
- cd "${LAYER}/${REGION_DIR}"
- |
if [ "${DRY_RUN}" = "true" ]; then
echo "DRY_RUN=true — running tf-run plan only"
TFARGS="-no-color" tf-run plan
elif [ -n "${TF_RUN_START_TAG}" ]; then
TFARGS="-auto-approve" tf-run apply "tag:${TF_RUN_START_TAG}"
else
TFARGS="-auto-approve" tf-run apply
fi

# --- Commit post-apply file changes back to main ---
# After a successful apply tf-run.data typically runs:
# COMMAND tf-directory-setup.py --link s3
# which re-links remote_state.{dir}.tf from .tf.none → .tf.s3.
# terraform init also generates/updates .terraform.lock.hcl.
# Both of these changes must be committed back to main so:
# (a) the repo reflects actual state for future Proposer re-renders
# (b) subsequent tf-init on main does not re-download all providers
# [skip ci] prevents the push from re-triggering the webhook executor.
- cd "${CODEBUILD_SRC_DIR}/repo"
- |
git add -A -- "${LAYER}/${REGION_DIR}/remote_state."* \
"${LAYER}/${REGION_DIR}/.terraform.lock.hcl" 2>/dev/null || true
if ! git diff --cached --quiet; then
git -c user.email="sc-automation@census.gov" \
-c user.name="SC Automation" \
commit -m "chore: executor post-apply update ${LAYER}/${REGION_DIR} [skip ci]"
git push \
"https://${GITHUB_TOKEN}@github.e.it.census.gov/${GITHUB_ORG}/${ACCOUNT_REPO}.git" \
HEAD:main
echo "Committed and pushed post-apply changes to main"
else
echo "No post-apply file changes to commit"
fi

post_build:
commands:
- echo "BUILD_RESULT=${CODEBUILD_BUILD_SUCCEEDING}"
- echo "ACCOUNT_REPO=${ACCOUNT_REPO}"
- echo "LAYER=${LAYER} REGION_DIR=${REGION_DIR}"

cache:
paths:
# Cache the provider plugin cache across builds for faster tf-init.
# Providers downloaded via Census proxy are stored here; subsequent builds
# skip re-downloading providers that haven't changed.
- /data/terraform/terraform.d/plugin-cache/**/*
Loading