From 803168ae43fe9f210a727687c096b4bbedce0b5d Mon Sep 17 00:00:00 2001 From: Your Name Date: Thu, 2 Apr 2026 15:24:43 -0400 Subject: [PATCH] fix: use Lambda-only approach for EKS repo creation; add Copilot instructions - service-catalog/product-template.yaml: drop LambdaFunctionArn parameter, - .github/copilot-instructions.md: document Lambda-first approach, explicitly record why CodeBuild+Terraform was abandoned (SSH host keys, proxy, provider version conflict), clarify CodeBuild is still used for container image CI/CD --- .github/copilot-instructions.md | 147 ++++++++++++++++++++++++++ service-catalog/product-template.yaml | 10 +- 2 files changed, 148 insertions(+), 9 deletions(-) create mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 0000000..363eec9 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,147 @@ +# GitHub Copilot Instructions — lambda-template-repo-generator + +## Project Purpose + +This repository contains the Lambda function that powers the EKS Cluster Automation (ECA) system. +When a team provisions the "EKS Terragrunt Repo" product via AWS Service Catalog, this Lambda: + +1. Receives a CloudFormation Custom Resource event +2. Creates a GitHub repository in the `SCT-Engineering` org on GitHub Enterprise +3. Clones `template-eks-cluster` as the starting structure +4. Renders 8 Terragrunt HCL files from Jinja2 templates (EKS-specific path) +5. Commits all files atomically via the Git tree API +6. Opens a pull request (`repo-init` → `main`) +7. Signals CloudFormation `SUCCESS`/`FAILED` + +--- + +## Critical Architecture Decision: Lambda, NOT CodeBuild + +**Do not suggest CodeBuild as the mechanism for creating EKS cluster repositories.** + +An earlier approach attempted to provision EKS repos by triggering a CodeBuild project that ran +`terraform apply` with a GitHub Terraform provider. That approach was **abandoned** due to: + +- SSH host key failures downloading remote Terraform modules +- AWS credential proxy incompatibility inside CodeBuild build environments +- S3 backend region mismatches +- Irreconcilable Terraform provider version conflicts (the `HappyPathway/terraform-github-repo` + public module pins `github ~> 6.0` while our internal modules require `>= 6.6.0`) + +**The correct approach:** CloudFormation Custom Resource → Lambda invocation (direct Python GitHub API). +No Terraform. No CodeBuild buildspec. No SSH keys. No provider version pinning. + +### What CodeBuild IS still used for (valid) + +CodeBuild **is** still the correct tool for building the Lambda container image: + +``` +packer-pipeline CLI → CodeBuild project (eks-terragrunt-repo-generator-builder) + → Packer + Docker build + → Push to ECR (229685449397.dkr.ecr.us-gov-west-1.amazonaws.com/eks-terragrunt-repo-generator/lambda) + → Lambda function updated via Terraform +``` + +This is the `packer-pipeline` CLI workflow. CodeBuild here is for **CI/CD of the Lambda image itself**, +not for creating customer repos. + +--- + +## Key Files + +| File | Purpose | +|------|---------| +| `template_automation/app.py` | Lambda entry point; CFN Custom Resource handler | +| `template_automation/eks_config.py` | Pydantic models + Jinja2 renderer for EKS HCL | +| `template_automation/github_provider.py` | GitHub API client (Git tree API, PRs, permissions) | +| `template_automation/templates/eks/` | 8 Jinja2 templates (root.hcl, cluster.hcl, vpc.hcl, etc.) | +| `service-catalog/product-template.yaml` | CFN template for the SC product (canonical source) | +| `deploy/` | Terraform deploying the Lambda infrastructure | +| `design-docs/README.md` | Architecture overview and implementation status | + +--- + +## Service Catalog Integration + +The Service Catalog product is defined by a CloudFormation template +(`service-catalog/product-template.yaml`). When a user submits the form: + +``` +SC Console (user fills form) + → CFN Stack creates Custom::GitHubRepository resource + → CFN calls Lambda via ServiceToken + → Lambda processes CloudFormationResourceInput (Pydantic model) + → Lambda creates repo, renders HCL, opens PR + → Lambda calls cfn-response SUCCESS + → CFN stack transitions to CREATE_COMPLETE + → SC provisioned product shows as AVAILABLE +``` + +The SC product is **managed by `terraform-service-catalog-census`** (not deployed from this repo). +The live CFN template lives at: +`terraform-service-catalog-census/templates/products/eks-terragrunt-repo/2-0-0.yaml` + +Both `service-catalog/product-template.yaml` here and `2-0-0.yaml` in census must stay in sync +(same parameters, same Lambda property names). + +--- + +## Lambda Invocation Details + +- **Function name**: `eks-terragrunt-repo-gen-template-automation` +- **Account**: `229685449397` (csvd-dev-gov, `us-gov-west-1`) +- **ServiceToken**: `arn:aws-us-gov:lambda:${AWS::Region}:${AWS::AccountId}:function:eks-terragrunt-repo-gen-template-automation` +- **Runtime env var**: `VERIFY_SSL=false` (Census CA cert is not in the container's `certifi` bundle) +- **GitHub Enterprise**: `https://github.e.it.census.gov`, org `SCT-Engineering` + +### EKS mode is triggered when all these fields are present in the event: +- `cluster_name` +- `account_name` +- `aws_account_id` +- `vpc_name` +- `vpc_domain_name` + +If any of these are missing, the Lambda falls back to **generic mode** (writes only `config.json`). +**Do not pass `vpc_id`** — the Lambda model field is `vpc_name` (a string). + +--- + +## Parameter Naming Convention + +The CFN product template passes parameters in `snake_case` directly to the Lambda. +The Lambda has a PascalCase→snake_case normalizer but it mishandles acronyms +(`AWSAccountId` → `a_w_s_account_id` instead of `aws_account_id`). Always pass +snake_case directly in the CFN `Properties` block: + +```yaml +Properties: + ServiceToken: !Sub "arn:aws-us-gov:lambda:..." + project_name: !Ref ProjectName # ← snake_case, not ProjectName + aws_account_id: !Ref AWSAccountId # ← snake_case, not AWSAccountId + vpc_name: !Ref VpcName # ← vpc_name, NOT vpc_id +``` + +--- + +## Testing + +```bash +# End-to-end EKS mode test (dry-run) +python scripts/test_workflow.py --eks --dry-run + +# Clean up test repos +python scripts/cleanup_test_repos.py + +# Validate GitHub PAT permissions +python scripts/check_github_permissions.py +``` + +--- + +## What NOT to Do + +- ❌ Do not create a `buildspec.yml` for repo creation — there is no CodeBuild approach here +- ❌ Do not use `hashicorp/github` or `HappyPathway/terraform-github-repo` Terraform providers for SC products +- ❌ Do not pass `vpc_id` to the Lambda — use `vpc_name` +- ❌ Do not deploy the SC portfolio/product from this repo — that's `terraform-service-catalog-census`'s job +- ❌ Do not re-add `LambdaFunctionArn` as a CFN parameter — use `!Sub "arn:..."` directly diff --git a/service-catalog/product-template.yaml b/service-catalog/product-template.yaml index 6b0ca50..a61c311 100644 --- a/service-catalog/product-template.yaml +++ b/service-catalog/product-template.yaml @@ -110,8 +110,6 @@ Parameters: AllowedValues: - us-gov-west-1 - us-gov-east-1 - - us-east-1 - - us-west-2 AccountName: Type: String @@ -168,12 +166,6 @@ Parameters: Description: 'Additional tags as JSON object (e.g., {"key1":"value1"})' Default: "{}" - # Hidden parameter - the Lambda ARN is passed in from the Service Catalog product definition - LambdaFunctionArn: - Type: String - Description: ARN of the Lambda function that creates EKS cluster repositories - Default: "arn:aws-us-gov:lambda:us-gov-west-1:229685449397:function:eks-terragrunt-repo-gen-template-automation" - Conditions: ClusterNameProvided: !Not - !Equals @@ -189,7 +181,7 @@ Resources: RepositoryCreator: Type: Custom::GitHubRepository Properties: - ServiceToken: !Ref LambdaFunctionArn + ServiceToken: !Sub "arn:aws-us-gov:lambda:${AWS::Region}:${AWS::AccountId}:function:eks-terragrunt-repo-gen-template-automation" # Core repo parameters project_name: !Ref ProjectName owning_team: !Ref OwningTeam