Skip to content

Commit

Permalink
docs: rewrite copilot-instructions to reflect CodeBuild+Terraform arc…
Browse files Browse the repository at this point in the history
…hitecture

- Replace 'Lambda, NOT CodeBuild' section with the actual working architecture
- Document Lambda as thin orchestrator triggering eks-terragrunt-repo-creator
- Add two-token split explanation (ghs_ App token vs ghp_ PAT for Terraform)
- Add TF_GITHUB_TOKEN_SECRET_NAME and CODEBUILD_PROJECT_NAME env vars
- Add correct rebuild/test commands
- Remove outdated CodeBuild-was-abandoned rationale
  • Loading branch information
Your Name committed Apr 7, 2026
1 parent 26c6fe9 commit 12a742a
Showing 1 changed file with 99 additions and 62 deletions.
161 changes: 99 additions & 62 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,76 +6,67 @@ This repository contains the Lambda function that powers the EKS Cluster Automat
When a team provisions the "EKS Terragrunt Repo" product via AWS Service Catalog, this Lambda:

1. Receives a CloudFormation Custom Resource event
2. Creates a GitHub repository in the `SCT-Engineering` org on GitHub Enterprise
3. Clones `template-eks-cluster` as the starting structure
4. Renders 8 Terragrunt HCL files from Jinja2 templates (EKS-specific path)
5. Commits all files atomically via the Git tree API
6. Opens a pull request (`repo-init``main`)
7. Signals CloudFormation `SUCCESS`/`FAILED`
2. Fetches a GitHub PAT from Secrets Manager (`ghe-runner/github-token`)
3. Triggers the `eks-terragrunt-repo-creator` CodeBuild project with EKS parameters as env vars
4. Polls CodeBuild every 20 seconds until the build completes or the Lambda deadline approaches
5. Fetches the open PR URL from the GitHub API after a successful build
6. Signals CloudFormation `SUCCESS`/`FAILED`

---

## Critical Architecture Decision: Lambda, NOT CodeBuild
All actual repo creation runs inside **CodeBuild** via the `terraform-eks-deployment` workspace:
- Clones `template-eks-cluster` via `CSVD/terraform-github-repo` Terraform module
- Writes 8 rendered Terragrunt HCL files via `managed_extra_files`
- Opens a pull request (`repo-init``main`)

**Do not suggest CodeBuild as the mechanism for creating EKS cluster repositories.**

An earlier approach attempted to provision EKS repos by triggering a CodeBuild project that ran
`terraform apply` with a GitHub Terraform provider. That approach was **abandoned** due to:
---

- SSH host key failures downloading remote Terraform modules
- AWS credential proxy incompatibility inside CodeBuild build environments
- S3 backend region mismatches
- Irreconcilable Terraform provider version conflicts (the `HappyPathway/terraform-github-repo`
public module pins `github ~> 6.0` while our internal modules require `>= 6.6.0`)
## Architecture: Lambda as Thin Orchestrator over CodeBuild + Terraform

**The correct approach:** CloudFormation Custom Resource → Lambda invocation (direct Python GitHub API).
No Terraform. No CodeBuild buildspec. No SSH keys. No provider version pinning.
```
SC Console (user fills form)
→ CFN Stack creates Custom::GitHubRepository resource
→ CFN calls Lambda (eks-terragrunt-repo-gen-template-automation) via ServiceToken
→ Lambda fetches PAT from Secrets Manager (ghe-runner/github-token)
→ Lambda starts CodeBuild project (eks-terragrunt-repo-creator) with TF_VAR_* env overrides
→ CodeBuild clones terraform-eks-deployment repo from GHE
→ CodeBuild runs: terraform init + terraform apply -auto-approve
→ Terraform (CSVD/terraform-github-repo module) creates GHE repo + writes HCL files + opens PR
→ Lambda polls CodeBuild, then fetches PR URL from GitHub API
→ Lambda sends cfn-response SUCCESS with repository_url + pull_request_url
→ CFN stack transitions to CREATE_COMPLETE
→ SC provisioned product shows as AVAILABLE
```

### What CodeBuild IS still used for (valid)
### CodeBuild Projects

CodeBuild **is** still the correct tool for building the Lambda container image:
There are **two** CodeBuild projects — do not confuse them:

```
packer-pipeline CLI → CodeBuild project (eks-terragrunt-repo-generator-builder)
→ Packer + Docker build
→ Push to ECR (229685449397.dkr.ecr.us-gov-west-1.amazonaws.com/eks-terragrunt-repo-generator/lambda)
→ Lambda function updated via Terraform
```
| Project | Purpose |
|---------|--------|
| `eks-terragrunt-repo-generator-builder` | Builds the Lambda container image (packer + Docker → ECR) |
| `eks-terragrunt-repo-creator` | Creates EKS cluster repos (tf init + tf apply inside terraform-eks-deployment) |

This is the `packer-pipeline` CLI workflow. CodeBuild here is for **CI/CD of the Lambda image itself**,
not for creating customer repos.
The Lambda triggers **`eks-terragrunt-repo-creator`** at runtime. The **`eks-terragrunt-repo-generator-builder`** is triggered manually via `packer-pipeline` when the Lambda code changes.

---

## Key Files

| File | Purpose |
|------|---------|
| `template_automation/app.py` | Lambda entry point; CFN Custom Resource handler |
| `template_automation/eks_config.py` | Pydantic models + Jinja2 renderer for EKS HCL |
| `template_automation/github_provider.py` | GitHub API client (Git tree API, PRs, permissions) |
| `template_automation/templates/eks/` | 8 Jinja2 templates (root.hcl, cluster.hcl, vpc.hcl, etc.) |
|------|--------|
| `template_automation/app.py` | Lambda entry point; CFN Custom Resource handler; `start_codebuild_build()` + `poll_codebuild_build()` |
| `template_automation/eks_config.py` | Pydantic models + `is_eks_deployment` check |
| `service-catalog/product-template.yaml` | CFN template for the SC product (canonical source) |
| `deploy/` | Terraform deploying the Lambda infrastructure |
| `design-docs/README.md` | Architecture overview and implementation status |
| `deploy/main.tf` | Terraform: Lambda, CodeBuild project, SC portfolio/product, IAM |
| `deploy/variables.tf` | Input variables including `codebuild_project_name`, `codebuild_role_arn` |
| `csvd_config_packer.hcl` | packer-pipeline config for building the Lambda container image |

The HCL rendering, repo creation, and PR opening logic lives in **`terraform-eks-deployment`**, not here.

---

## Service Catalog Integration

The Service Catalog product is defined by a CloudFormation template
(`service-catalog/product-template.yaml`). When a user submits the form:

```
SC Console (user fills form)
→ CFN Stack creates Custom::GitHubRepository resource
→ CFN calls Lambda via ServiceToken
→ Lambda processes CloudFormationResourceInput (Pydantic model)
→ Lambda creates repo, renders HCL, opens PR
→ Lambda calls cfn-response SUCCESS
→ CFN stack transitions to CREATE_COMPLETE
→ SC provisioned product shows as AVAILABLE
```
The Service Catalog product is defined by `service-catalog/product-template.yaml`.

## SC Product Deployment Methods

Expand All @@ -90,7 +81,7 @@ tf init
tf apply
```

This deploys the Lambda + SC portfolio + SC product + constraints directly.
Deploys the Lambda + CodeBuild project + SC portfolio/product + constraints directly.
Use this as the **reference deployment** when debugging issues with the census pipeline.
IDs after last apply: portfolio `port-h5qd63hw5yagq`, product `prod-lmua4oknugafg`.

Expand All @@ -101,22 +92,40 @@ cd terraform-service-catalog-census/non-prod/csvd-dev/west/service-catalog
tf apply # (via terragrunt)
```

This is the census-managed production deployment path. The live CFN template lives at:
Census-managed production deployment path. The live CFN template lives at:
`terraform-service-catalog-census/templates/products/eks-terragrunt-repo/2-0-0.yaml`

Both `service-catalog/product-template.yaml` here and `2-0-0.yaml` in census must stay in sync
(same parameters, same Lambda property names).

---

## Lambda Invocation Details
## Lambda Runtime Details

- **Function name**: `eks-terragrunt-repo-gen-template-automation`
- **Account**: `229685449397` (csvd-dev-gov, `us-gov-west-1`)
- **Timeout**: 900s (15 min) — must exceed CodeBuild poll window
- **ServiceToken**: `arn:aws-us-gov:lambda:${AWS::Region}:${AWS::AccountId}:function:eks-terragrunt-repo-gen-template-automation`
- **Runtime env var**: `VERIFY_SSL=false` (Census CA cert is not in the container's `certifi` bundle)
- **GitHub Enterprise**: `https://github.e.it.census.gov`, org `SCT-Engineering`

### Key environment variables

| Variable | Value | Purpose |
|----------|-------|---------|
| `VERIFY_SSL` | `false` | Census CA cert not in the container's `certifi` bundle |
| `GITHUB_TOKEN_SECRET_NAME` | `/eks-cluster-deployment/github_token` | App installation token (`ghs_`) — used by Lambda for Python GitHub API calls |
| `TF_GITHUB_TOKEN_SECRET_NAME` | `ghe-runner/github-token` | PAT (`ghp_`) — passed to CodeBuild as `GITHUB_TOKEN` for the Terraform GitHub provider |
| `CODEBUILD_PROJECT_NAME` | `eks-terragrunt-repo-creator` | CodeBuild project to trigger |
| `GITHUB_API` | `https://github.e.it.census.gov` | GHE API base URL |
| `GITHUB_ORG_NAME` | `SCT-Engineering` | Target GitHub org |

### Why two GitHub tokens?

- `GITHUB_TOKEN_SECRET_NAME` holds a **GitHub App installation token** (`ghs_` prefix). It can perform
org-level API calls but **cannot** access `/api/v3/user`, which the CSVD Terraform module requires.
- `TF_GITHUB_TOKEN_SECRET_NAME` holds a **personal access token** (`ghp_` prefix, user `arnol377`).
This is passed to CodeBuild and used by the Terraform GitHub provider.

### EKS mode is triggered when all these fields are present in the event:
- `cluster_name`
- `account_name`
Expand Down Expand Up @@ -146,27 +155,55 @@ Properties:
---
## Rebuilding the Lambda Image
When `template_automation/app.py` or other Lambda source files change:

```bash
# 1. Zip source and upload to S3
cd lambda-template-repo-generator
zip -r ~/tmp/lambda-source.zip . -x "*.git*" -x "design-docs/*" -x "__pycache__/*" -x "*.pyc" -x "deploy/.terraform/*" -x "deploy/terraform.tfstate*"
UUID=$(python3 -c "import uuid; print(uuid.uuid4())")
source ~/aws-creds
aws s3 cp ~/tmp/lambda-source.zip \
"s3://csvd-packer-pipeline-builds/packer-builds/eks-terragrunt-repo-generator/source/${UUID}/repo.zip" \
--region us-gov-west-1
# 2. Start the packer CodeBuild build
aws codebuild start-build \
--project-name eks-terragrunt-repo-generator-builder \
--region us-gov-west-1 \
--source-type-override S3 \
--source-location-override "csvd-packer-pipeline-builds/packer-builds/eks-terragrunt-repo-generator/source/${UUID}/repo.zip"
# 3. After build SUCCEEDED, force Lambda to pull the new image
aws lambda update-function-code \
--function-name eks-terragrunt-repo-gen-template-automation \
--image-uri "229685449397.dkr.ecr.us-gov-west-1.amazonaws.com/eks-terragrunt-repo-generator/lambda:latest" \
--region us-gov-west-1
```

## Testing

```bash
# End-to-end EKS mode test (dry-run)
python scripts/test_workflow.py --eks --dry-run
# End-to-end Service Catalog test (provisions + verifies + terminates)
source ~/aws-creds
cd lambda-template-repo-generator
python scripts/test_service_catalog.py sc-e2e-test-$(date +%Y%m%d-%H%M)
# Clean up test repos
# Clean up leftover test repos
python scripts/cleanup_test_repos.py

# Validate GitHub PAT permissions
python scripts/check_github_permissions.py
```

---

## What NOT to Do

- ❌ Do not create a `buildspec.yml` for repo creation using the **old** CodeBuild+Terraform approach
- ❌ Do not rewrite repo creation logic in Lambda Python — all repo creation runs in CodeBuild via `terraform-eks-deployment`
- ❌ Do not use `HappyPathway/terraform-github-repo` **public** module — it pins `github ~> 6.0` (conflicts with internal `>= 6.6.0`)
- ✅ DO use `CSVD/terraform-github-repo` (https://github.e.it.census.gov/CSVD/terraform-github-repo) — internal module, uses `github 6.6.0`, supports `template_repo` + `managed_extra_files`
- ✅ DO use `CSVD/terraform-github-repo` (https://github.e.it.census.gov/CSVD/terraform-github-repo) — internal module, supports `template_repo` + `managed_extra_files`
- ❌ Do not pass `vpc_id` to the Lambda — use `vpc_name`
- ❌ Do not re-add `LambdaFunctionArn` as a CFN parameter — use `!Sub "arn:..."` directly
- ❌ Do not use SSH-based module sources (`git::ssh://`) — Census proxy blocks SSH host key exchange; use HTTPS
- ❌ Do not write temp files or command output to `/tmp` — use `~/tmp` (i.e. `/home/a/arnol377/tmp`) instead
- ❌ Do not use the `terraform` command directly — always use the `tf` alias (e.g. `tf plan`, `tf apply`, `tf init`)

0 comments on commit 12a742a

Please sign in to comment.