-
Notifications
You must be signed in to change notification settings - Fork 0
fix: EKS-only Lambda cleanup + SC template AwsRegion/AWSAccountId removal #1
Merged
+3,016
−5,348
Merged
Changes from 13 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
803168a
fix: use Lambda-only approach for EKS repo creation; add Copilot inst…
0a74dd7
fix: public visibility by default; add collaborator support for repo …
528f4b3
fix: VERIFY_SSL=false; public repo visibility; add ec2:DescribeVpcs t…
a79cee4
feat: path_mapper for dynamic EKS repo structure (safe revert baseline)
ec54b54
feat: Lambda delegates EKS repos to CodeBuild + terraform-eks-deployment
52ebef0
chore: tf apply — add eks-terragrunt-repo-creator CodeBuild project +…
aee6987
fix: add CodeBuild VPC endpoint + IAM policy for Lambda→CodeBuild con…
8310ee1
fix: increase Lambda timeout to 900s to cover CodeBuild poll window
eb18463
fix: remove spurious '- ' prefix from additional_post_build_commands
5d3ff19
fix: use PAT (ghe-runner/github-token) for Terraform GitHub provider …
26c6fe9
fix: add pull_request_url and branch_name to CodeBuild success response
12a742a
docs: rewrite copilot-instructions to reflect CodeBuild+Terraform arc…
065d2f2
chore: update deploy Terraform state after tf apply
560a5ec
fix: address PR1 review comments — EKS-only Lambda + Terraform cleanup
dff9bfa
docs: clarify cross-account architecture + fix stale refs
e6547ed
docs: add ECA demo script with talking points and Q&A prep
ff2a6b5
fix(lambda): make EKS fields required; remove is_eks_deployment dead …
f37b6c6
fix(sc-template): remove AwsRegion/AWSAccountId as user-facing parame…
237ab9b
fix(deploy): add eks-repo-creator buildspec; fix partition refs in IA…
8b268ff
chore: update docs, scripts, and state to reflect current architecture
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,209 @@ | ||
| # GitHub Copilot Instructions — lambda-template-repo-generator | ||
|
|
||
| ## Project Purpose | ||
|
|
||
| This repository contains the Lambda function that powers the EKS Cluster Automation (ECA) system. | ||
| When a team provisions the "EKS Terragrunt Repo" product via AWS Service Catalog, this Lambda: | ||
|
|
||
| 1. Receives a CloudFormation Custom Resource event | ||
| 2. Fetches a GitHub PAT from Secrets Manager (`ghe-runner/github-token`) | ||
| 3. Triggers the `eks-terragrunt-repo-creator` CodeBuild project with EKS parameters as env vars | ||
| 4. Polls CodeBuild every 20 seconds until the build completes or the Lambda deadline approaches | ||
| 5. Fetches the open PR URL from the GitHub API after a successful build | ||
| 6. Signals CloudFormation `SUCCESS`/`FAILED` | ||
|
|
||
| All actual repo creation runs inside **CodeBuild** via the `terraform-eks-deployment` workspace: | ||
| - Clones `template-eks-cluster` via `CSVD/terraform-github-repo` Terraform module | ||
| - Writes 8 rendered Terragrunt HCL files via `managed_extra_files` | ||
| - Opens a pull request (`repo-init` → `main`) | ||
|
|
||
| --- | ||
|
|
||
| ## Architecture: Lambda as Thin Orchestrator over CodeBuild + Terraform | ||
|
|
||
| ``` | ||
| SC Console (user fills form) | ||
| → CFN Stack creates Custom::GitHubRepository resource | ||
| → CFN calls Lambda (eks-terragrunt-repo-gen-template-automation) via ServiceToken | ||
| → Lambda fetches PAT from Secrets Manager (ghe-runner/github-token) | ||
| → Lambda starts CodeBuild project (eks-terragrunt-repo-creator) with TF_VAR_* env overrides | ||
| → CodeBuild clones terraform-eks-deployment repo from GHE | ||
| → CodeBuild runs: terraform init + terraform apply -auto-approve | ||
| → Terraform (CSVD/terraform-github-repo module) creates GHE repo + writes HCL files + opens PR | ||
| → Lambda polls CodeBuild, then fetches PR URL from GitHub API | ||
| → Lambda sends cfn-response SUCCESS with repository_url + pull_request_url | ||
| → CFN stack transitions to CREATE_COMPLETE | ||
| → SC provisioned product shows as AVAILABLE | ||
| ``` | ||
|
|
||
| ### CodeBuild Projects | ||
|
|
||
| There are **two** CodeBuild projects — do not confuse them: | ||
|
|
||
| | Project | Purpose | | ||
| |---------|--------| | ||
| | `eks-terragrunt-repo-generator-builder` | Builds the Lambda container image (packer + Docker → ECR) | | ||
| | `eks-terragrunt-repo-creator` | Creates EKS cluster repos (tf init + tf apply inside terraform-eks-deployment) | | ||
|
|
||
| The Lambda triggers **`eks-terragrunt-repo-creator`** at runtime. The **`eks-terragrunt-repo-generator-builder`** is triggered manually via `packer-pipeline` when the Lambda code changes. | ||
|
|
||
| --- | ||
|
|
||
| ## Key Files | ||
|
|
||
| | File | Purpose | | ||
| |------|--------| | ||
| | `template_automation/app.py` | Lambda entry point; CFN Custom Resource handler; `start_codebuild_build()` + `poll_codebuild_build()` | | ||
| | `template_automation/eks_config.py` | Pydantic models + `is_eks_deployment` check | | ||
| | `service-catalog/product-template.yaml` | CFN template for the SC product (canonical source) | | ||
| | `deploy/main.tf` | Terraform: Lambda, CodeBuild project, SC portfolio/product, IAM | | ||
| | `deploy/variables.tf` | Input variables including `codebuild_project_name`, `codebuild_role_arn` | | ||
| | `csvd_config_packer.hcl` | packer-pipeline config for building the Lambda container image | | ||
|
|
||
| The HCL rendering, repo creation, and PR opening logic lives in **`terraform-eks-deployment`**, not here. | ||
|
|
||
| --- | ||
|
|
||
| ## Service Catalog Integration | ||
|
|
||
| The Service Catalog product is defined by `service-catalog/product-template.yaml`. | ||
|
|
||
| ## SC Product Deployment Methods | ||
|
|
||
| There are **two ways** to deploy the Service Catalog product. Both use the same | ||
| `service-catalog/product-template.yaml` CFN template — they must stay in sync. | ||
|
|
||
| ### Method 1: Direct Terraform via `deploy/` (canonical, use for testing/debugging) | ||
|
|
||
| ```bash | ||
| cd lambda-template-repo-generator/deploy | ||
| tf init | ||
| tf apply | ||
| ``` | ||
|
|
||
| Deploys the Lambda + CodeBuild project + SC portfolio/product + constraints directly. | ||
| Use this as the **reference deployment** when debugging issues with the census pipeline. | ||
| IDs after last apply: portfolio `port-h5qd63hw5yagq`, product `prod-lmua4oknugafg`. | ||
|
|
||
| ### Method 2: `terraform-service-catalog-census` Terragrunt (production path) | ||
|
|
||
| ```bash | ||
| cd terraform-service-catalog-census/non-prod/csvd-dev/west/service-catalog | ||
| tf apply # (via terragrunt) | ||
| ``` | ||
|
|
||
| Census-managed production deployment path. The live CFN template lives at: | ||
| `terraform-service-catalog-census/templates/products/eks-terragrunt-repo/2-0-0.yaml` | ||
|
|
||
| Both `service-catalog/product-template.yaml` here and `2-0-0.yaml` in census must stay in sync | ||
| (same parameters, same Lambda property names). | ||
|
|
||
| --- | ||
|
|
||
| ## Lambda Runtime Details | ||
|
|
||
| - **Function name**: `eks-terragrunt-repo-gen-template-automation` | ||
| - **Account**: `229685449397` (csvd-dev-gov, `us-gov-west-1`) | ||
| - **Timeout**: 900s (15 min) — must exceed CodeBuild poll window | ||
| - **ServiceToken**: `arn:aws-us-gov:lambda:${AWS::Region}:${AWS::AccountId}:function:eks-terragrunt-repo-gen-template-automation` | ||
| - **GitHub Enterprise**: `https://github.e.it.census.gov`, org `SCT-Engineering` | ||
|
|
||
| ### Key environment variables | ||
|
|
||
| | Variable | Value | Purpose | | ||
| |----------|-------|---------| | ||
| | `VERIFY_SSL` | `false` | Census CA cert not in the container's `certifi` bundle | | ||
| | `GITHUB_TOKEN_SECRET_NAME` | `/eks-cluster-deployment/github_token` | App installation token (`ghs_`) — used by Lambda for Python GitHub API calls | | ||
| | `TF_GITHUB_TOKEN_SECRET_NAME` | `ghe-runner/github-token` | PAT (`ghp_`) — passed to CodeBuild as `GITHUB_TOKEN` for the Terraform GitHub provider | | ||
| | `CODEBUILD_PROJECT_NAME` | `eks-terragrunt-repo-creator` | CodeBuild project to trigger | | ||
| | `GITHUB_API` | `https://github.e.it.census.gov` | GHE API base URL | | ||
| | `GITHUB_ORG_NAME` | `SCT-Engineering` | Target GitHub org | | ||
|
|
||
| ### Why two GitHub tokens? | ||
|
|
||
| - `GITHUB_TOKEN_SECRET_NAME` holds a **GitHub App installation token** (`ghs_` prefix). It can perform | ||
| org-level API calls but **cannot** access `/api/v3/user`, which the CSVD Terraform module requires. | ||
| - `TF_GITHUB_TOKEN_SECRET_NAME` holds a **personal access token** (`ghp_` prefix, user `arnol377`). | ||
| This is passed to CodeBuild and used by the Terraform GitHub provider. | ||
|
|
||
| ### EKS mode is triggered when all these fields are present in the event: | ||
| - `cluster_name` | ||
| - `account_name` | ||
| - `aws_account_id` | ||
| - `vpc_name` | ||
| - `vpc_domain_name` | ||
|
|
||
| If any of these are missing, the Lambda falls back to **generic mode** (writes only `config.json`). | ||
| **Do not pass `vpc_id`** — the Lambda model field is `vpc_name` (a string). | ||
|
|
||
| --- | ||
|
|
||
| ## Parameter Naming Convention | ||
|
|
||
| The CFN product template passes parameters in `snake_case` directly to the Lambda. | ||
| The Lambda has a PascalCase→snake_case normalizer but it mishandles acronyms | ||
| (`AWSAccountId` → `a_w_s_account_id` instead of `aws_account_id`). Always pass | ||
| snake_case directly in the CFN `Properties` block: | ||
|
|
||
| ```yaml | ||
| Properties: | ||
| ServiceToken: !Sub "arn:aws-us-gov:lambda:..." | ||
| project_name: !Ref ProjectName # ← snake_case, not ProjectName | ||
| aws_account_id: !Ref AWSAccountId # ← snake_case, not AWSAccountId | ||
| vpc_name: !Ref VpcName # ← vpc_name, NOT vpc_id | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Rebuilding the Lambda Image | ||
|
|
||
| When `template_automation/app.py` or other Lambda source files change: | ||
|
|
||
| ```bash | ||
| # 1. Zip source and upload to S3 | ||
| cd lambda-template-repo-generator | ||
| zip -r ~/tmp/lambda-source.zip . -x "*.git*" -x "design-docs/*" -x "__pycache__/*" -x "*.pyc" -x "deploy/.terraform/*" -x "deploy/terraform.tfstate*" | ||
| UUID=$(python3 -c "import uuid; print(uuid.uuid4())") | ||
| source ~/aws-creds | ||
| aws s3 cp ~/tmp/lambda-source.zip \ | ||
| "s3://csvd-packer-pipeline-builds/packer-builds/eks-terragrunt-repo-generator/source/${UUID}/repo.zip" \ | ||
| --region us-gov-west-1 | ||
|
|
||
| # 2. Start the packer CodeBuild build | ||
| aws codebuild start-build \ | ||
| --project-name eks-terragrunt-repo-generator-builder \ | ||
| --region us-gov-west-1 \ | ||
| --source-type-override S3 \ | ||
| --source-location-override "csvd-packer-pipeline-builds/packer-builds/eks-terragrunt-repo-generator/source/${UUID}/repo.zip" | ||
|
|
||
| # 3. After build SUCCEEDED, force Lambda to pull the new image | ||
| aws lambda update-function-code \ | ||
| --function-name eks-terragrunt-repo-gen-template-automation \ | ||
| --image-uri "229685449397.dkr.ecr.us-gov-west-1.amazonaws.com/eks-terragrunt-repo-generator/lambda:latest" \ | ||
| --region us-gov-west-1 | ||
| ``` | ||
|
|
||
| ## Testing | ||
|
|
||
| ```bash | ||
| # End-to-end Service Catalog test (provisions + verifies + terminates) | ||
| source ~/aws-creds | ||
| cd lambda-template-repo-generator | ||
| python scripts/test_service_catalog.py sc-e2e-test-$(date +%Y%m%d-%H%M) | ||
|
|
||
| # Clean up leftover test repos | ||
| python scripts/cleanup_test_repos.py | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## What NOT to Do | ||
|
|
||
| - ❌ Do not rewrite repo creation logic in Lambda Python — all repo creation runs in CodeBuild via `terraform-eks-deployment` | ||
| - ❌ Do not use `HappyPathway/terraform-github-repo` **public** module — it pins `github ~> 6.0` (conflicts with internal `>= 6.6.0`) | ||
| - ✅ DO use `CSVD/terraform-github-repo` (https://github.e.it.census.gov/CSVD/terraform-github-repo) — internal module, supports `template_repo` + `managed_extra_files` | ||
| - ❌ Do not pass `vpc_id` to the Lambda — use `vpc_name` | ||
| - ❌ Do not re-add `LambdaFunctionArn` as a CFN parameter — use `!Sub "arn:..."` directly | ||
| - ❌ Do not use SSH-based module sources (`git::ssh://`) — Census proxy blocks SSH host key exchange; use HTTPS | ||
| - ❌ Do not write temp files or command output to `/tmp` — use `~/tmp` (i.e. `/home/a/arnol377/tmp`) instead | ||
| - ❌ Do not use the `terraform` command directly — always use the `tf` alias (e.g. `tf plan`, `tf apply`, `tf init`) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,8 +17,8 @@ packer_pipeline { | |
| tools = [ | ||
| { | ||
| name = "packer" | ||
| version = "1.13.0" | ||
| zip_path = "packer_1.13.0_linux_amd64.zip" | ||
| version = "1.10.3" | ||
| zip_path = "packer_1.10.3_linux_amd64.zip" | ||
| binary_name = "packer" | ||
| install_path = "/usr/local/bin" | ||
| } | ||
|
|
@@ -29,7 +29,8 @@ packer_pipeline { | |
| partition = "aws-us-gov" // AWS partition (aws or aws-us-gov) | ||
|
|
||
| // Role management | ||
| create_role = true // Enable automatic role creation | ||
| create_role = false // Role already exists; provide ARN directly | ||
| codebuild_role_arn = "arn:aws-us-gov:iam::229685449397:role/CodeBuildPackerRole-eks-terragrunt-repo-generator-builder" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be looked up by Name, partition, account id |
||
|
|
||
| // Region and partition configuration | ||
| aws_region = "us-gov-west-1" // AWS region | ||
|
|
||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we always want to create the role?