Skip to content

Commit

Permalink
fix: use Lambda-only approach for EKS repo creation; add Copilot inst…
Browse files Browse the repository at this point in the history
…ructions

- service-catalog/product-template.yaml: drop LambdaFunctionArn parameter,
- .github/copilot-instructions.md: document Lambda-first approach, explicitly
  record why CodeBuild+Terraform was abandoned (SSH host keys, proxy, provider
  version conflict), clarify CodeBuild is still used for container image CI/CD
  • Loading branch information
Your Name committed Apr 2, 2026
1 parent 3a11631 commit 803168a
Show file tree
Hide file tree
Showing 2 changed files with 148 additions and 9 deletions.
147 changes: 147 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# GitHub Copilot Instructions — lambda-template-repo-generator

## Project Purpose

This repository contains the Lambda function that powers the EKS Cluster Automation (ECA) system.
When a team provisions the "EKS Terragrunt Repo" product via AWS Service Catalog, this Lambda:

1. Receives a CloudFormation Custom Resource event
2. Creates a GitHub repository in the `SCT-Engineering` org on GitHub Enterprise
3. Clones `template-eks-cluster` as the starting structure
4. Renders 8 Terragrunt HCL files from Jinja2 templates (EKS-specific path)
5. Commits all files atomically via the Git tree API
6. Opens a pull request (`repo-init``main`)
7. Signals CloudFormation `SUCCESS`/`FAILED`

---

## Critical Architecture Decision: Lambda, NOT CodeBuild

**Do not suggest CodeBuild as the mechanism for creating EKS cluster repositories.**

An earlier approach attempted to provision EKS repos by triggering a CodeBuild project that ran
`terraform apply` with a GitHub Terraform provider. That approach was **abandoned** due to:

- SSH host key failures downloading remote Terraform modules
- AWS credential proxy incompatibility inside CodeBuild build environments
- S3 backend region mismatches
- Irreconcilable Terraform provider version conflicts (the `HappyPathway/terraform-github-repo`
public module pins `github ~> 6.0` while our internal modules require `>= 6.6.0`)

**The correct approach:** CloudFormation Custom Resource → Lambda invocation (direct Python GitHub API).
No Terraform. No CodeBuild buildspec. No SSH keys. No provider version pinning.

### What CodeBuild IS still used for (valid)

CodeBuild **is** still the correct tool for building the Lambda container image:

```
packer-pipeline CLI → CodeBuild project (eks-terragrunt-repo-generator-builder)
→ Packer + Docker build
→ Push to ECR (229685449397.dkr.ecr.us-gov-west-1.amazonaws.com/eks-terragrunt-repo-generator/lambda)
→ Lambda function updated via Terraform
```

This is the `packer-pipeline` CLI workflow. CodeBuild here is for **CI/CD of the Lambda image itself**,
not for creating customer repos.

---

## Key Files

| File | Purpose |
|------|---------|
| `template_automation/app.py` | Lambda entry point; CFN Custom Resource handler |
| `template_automation/eks_config.py` | Pydantic models + Jinja2 renderer for EKS HCL |
| `template_automation/github_provider.py` | GitHub API client (Git tree API, PRs, permissions) |
| `template_automation/templates/eks/` | 8 Jinja2 templates (root.hcl, cluster.hcl, vpc.hcl, etc.) |
| `service-catalog/product-template.yaml` | CFN template for the SC product (canonical source) |
| `deploy/` | Terraform deploying the Lambda infrastructure |
| `design-docs/README.md` | Architecture overview and implementation status |

---

## Service Catalog Integration

The Service Catalog product is defined by a CloudFormation template
(`service-catalog/product-template.yaml`). When a user submits the form:

```
SC Console (user fills form)
→ CFN Stack creates Custom::GitHubRepository resource
→ CFN calls Lambda via ServiceToken
→ Lambda processes CloudFormationResourceInput (Pydantic model)
→ Lambda creates repo, renders HCL, opens PR
→ Lambda calls cfn-response SUCCESS
→ CFN stack transitions to CREATE_COMPLETE
→ SC provisioned product shows as AVAILABLE
```

The SC product is **managed by `terraform-service-catalog-census`** (not deployed from this repo).
The live CFN template lives at:
`terraform-service-catalog-census/templates/products/eks-terragrunt-repo/2-0-0.yaml`

Both `service-catalog/product-template.yaml` here and `2-0-0.yaml` in census must stay in sync
(same parameters, same Lambda property names).

---

## Lambda Invocation Details

- **Function name**: `eks-terragrunt-repo-gen-template-automation`
- **Account**: `229685449397` (csvd-dev-gov, `us-gov-west-1`)
- **ServiceToken**: `arn:aws-us-gov:lambda:${AWS::Region}:${AWS::AccountId}:function:eks-terragrunt-repo-gen-template-automation`
- **Runtime env var**: `VERIFY_SSL=false` (Census CA cert is not in the container's `certifi` bundle)
- **GitHub Enterprise**: `https://github.e.it.census.gov`, org `SCT-Engineering`

### EKS mode is triggered when all these fields are present in the event:
- `cluster_name`
- `account_name`
- `aws_account_id`
- `vpc_name`
- `vpc_domain_name`

If any of these are missing, the Lambda falls back to **generic mode** (writes only `config.json`).
**Do not pass `vpc_id`** — the Lambda model field is `vpc_name` (a string).

---

## Parameter Naming Convention

The CFN product template passes parameters in `snake_case` directly to the Lambda.
The Lambda has a PascalCase→snake_case normalizer but it mishandles acronyms
(`AWSAccountId``a_w_s_account_id` instead of `aws_account_id`). Always pass
snake_case directly in the CFN `Properties` block:

```yaml
Properties:
ServiceToken: !Sub "arn:aws-us-gov:lambda:..."
project_name: !Ref ProjectName # ← snake_case, not ProjectName
aws_account_id: !Ref AWSAccountId # ← snake_case, not AWSAccountId
vpc_name: !Ref VpcName # ← vpc_name, NOT vpc_id
```
---
## Testing
```bash
# End-to-end EKS mode test (dry-run)
python scripts/test_workflow.py --eks --dry-run

# Clean up test repos
python scripts/cleanup_test_repos.py

# Validate GitHub PAT permissions
python scripts/check_github_permissions.py
```

---

## What NOT to Do

- ❌ Do not create a `buildspec.yml` for repo creation — there is no CodeBuild approach here
- ❌ Do not use `hashicorp/github` or `HappyPathway/terraform-github-repo` Terraform providers for SC products
- ❌ Do not pass `vpc_id` to the Lambda — use `vpc_name`
- ❌ Do not deploy the SC portfolio/product from this repo — that's `terraform-service-catalog-census`'s job
- ❌ Do not re-add `LambdaFunctionArn` as a CFN parameter — use `!Sub "arn:..."` directly
10 changes: 1 addition & 9 deletions service-catalog/product-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,6 @@ Parameters:
AllowedValues:
- us-gov-west-1
- us-gov-east-1
- us-east-1
- us-west-2

AccountName:
Type: String
Expand Down Expand Up @@ -168,12 +166,6 @@ Parameters:
Description: 'Additional tags as JSON object (e.g., {"key1":"value1"})'
Default: "{}"

# Hidden parameter - the Lambda ARN is passed in from the Service Catalog product definition
LambdaFunctionArn:
Type: String
Description: ARN of the Lambda function that creates EKS cluster repositories
Default: "arn:aws-us-gov:lambda:us-gov-west-1:229685449397:function:eks-terragrunt-repo-gen-template-automation"

Conditions:
ClusterNameProvided: !Not
- !Equals
Expand All @@ -189,7 +181,7 @@ Resources:
RepositoryCreator:
Type: Custom::GitHubRepository
Properties:
ServiceToken: !Ref LambdaFunctionArn
ServiceToken: !Sub "arn:aws-us-gov:lambda:${AWS::Region}:${AWS::AccountId}:function:eks-terragrunt-repo-gen-template-automation"
# Core repo parameters
project_name: !Ref ProjectName
owning_team: !Ref OwningTeam
Expand Down

0 comments on commit 803168a

Please sign in to comment.