-
Notifications
You must be signed in to change notification settings - Fork 0
fix: EKS-only Lambda cleanup + SC template AwsRegion/AWSAccountId removal #1
Conversation
…ructions - service-catalog/product-template.yaml: drop LambdaFunctionArn parameter, - .github/copilot-instructions.md: document Lambda-first approach, explicitly record why CodeBuild+Terraform was abandoned (SSH host keys, proxy, provider version conflict), clarify CodeBuild is still used for container image CI/CD
…creator
- repository_provider.py: change REPO_VISIBILITY default from 'internal' to
'public' so created repos are visible to all org members and external
visitors on GHE
- github_provider.py: change fallback visibility from 'internal' to 'public';
add add_collaborator() method (PUT /repos/{org}/{repo}/collaborators/{user})
- app.py: add optional creator_username field to CloudFormationResourceInput;
after team permission grant, also add individual creator as admin collaborator
when creator_username is provided (non-fatal on failure)
- service-catalog/product-template.yaml: add optional CreatorUsername parameter
(defaults to empty string, backward-compatible); wire to creator_username
Lambda property
…o SC launch role - VERIFY_SSL was incorrectly set to 'true' (Census CA cert not in certifi) - repo_visibility changed from 'internal' to 'public' per ECA requirements - Added EC2DescribeVpcs permission to SC launch role IAM policy
- build_eks_path_mapper remaps environment/region/vpc/cluster/ placeholders to concrete values from CFN params - github_provider: clone_repository_contents accepts optional path_mapper - csvd_config_packer.hcl: create_role=false, packer 1.10.3, role ARN wired - deploy/service_catalog.tf: SC product and portfolio synced - Updated copilot instructions with Lambda-only architecture decision
- app.py: add start_codebuild_build() and poll_codebuild_build() helpers - app.py: EKS deployment path (is_eks_deployment=True) now starts CodeBuild project 'eks-terragrunt-repo-creator', polls until SUCCEEDED/FAILED, and sends cfn-response accordingly; non-EKS path unchanged - deploy/main.tf: add aws_codebuild_project.eks_repo_creator resource (NO_SOURCE, uses buildspec.yml from terraform-eks-deployment) CODEBUILD_PROJECT_NAME injected into Lambda environment - deploy/variables.tf: codebuild_project_name, codebuild_role_arn, codebuild_vpc_id - deploy/terraform.tfvars: set CodeBuild project name, role ARN, VPC ID
… Lambda CODEBUILD_PROJECT_NAME env var
…nectivity Root cause of 'Connect timeout on codebuild.us-gov-west-1.amazonaws.com': - Lambda is in a VPC with no NAT gateway path to CodeBuild's public endpoint - Lambda role was missing codebuild:StartBuild / codebuild:BatchGetBuilds perms Fix: - aws_vpc_endpoint.codebuild[0]: interface endpoint for CodeBuild in the Lambda VPC with private DNS enabled (Lambda API calls resolve to private IPs) - aws_iam_role_policy.codebuild_access: StartBuild + BatchGetBuilds on the eks-terragrunt-repo-creator project ARN only
Lambda was set to 300s but poll_codebuild_build loops for up to 12 min (720s). Lambda would be killed by AWS before it could report back to CloudFormation. 900s gives a ~180s buffer beyond the poll window.
The packer-pipeline internal buildspec template already wraps the value
in '- {{ additional_post_build_commands }}', so prefixing the value with
'- ' caused YAML_FILE_ERROR (nested list) in CodeBuild build #8.
…in CodeBuild The standard github_token (/eks-cluster-deployment/github_token) is a GitHub App installation token (ghs_ prefix) which cannot access /api/v3/user. This endpoint is always called by the CSVD terraform-github-repo module's data.github_user.current resource. Changes: - app.py: check TF_GITHUB_TOKEN_SECRET_NAME env var first for CodeBuild token; falls back to GITHUB_TOKEN_SECRET_NAME if not set - deploy/main.tf: add TF_GITHUB_TOKEN_SECRET_NAME=ghe-runner/github-token env var - deploy/main.tf: add IAM policy granting Lambda access to ghe-runner/github-token
returning repository_url/repository_name, causing CFN to fail with: 'Vendor response doesn't contain pull_request_url attribute' After CodeBuild SUCCEEDED, query GitHub API /pulls?state=open on the created repo to get the real PR URL and branch name.
…hitecture - Replace 'Lambda, NOT CodeBuild' section with the actual working architecture - Document Lambda as thin orchestrator triggering eks-terragrunt-repo-creator - Add two-token split explanation (ghs_ App token vs ghp_ PAT for Terraform) - Add TF_GITHUB_TOKEN_SECRET_NAME and CODEBUILD_PROJECT_NAME env vars - Add correct rebuild/test commands - Remove outdated CodeBuild-was-abandoned rationale
| @@ -29,7 +29,8 @@ packer_pipeline { | |||
| partition = "aws-us-gov" // AWS partition (aws or aws-us-gov) | |||
|
|
|||
| // Role management | |||
| create_role = true // Enable automatic role creation | |||
| create_role = false // Role already exists; provide ARN directly | |||
| codebuild_role_arn = "arn:aws-us-gov:iam::229685449397:role/CodeBuildPackerRole-eks-terragrunt-repo-generator-builder" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be looked up by Name, partition, account id
csvd_config_packer.hcl
Outdated
| @@ -29,7 +29,8 @@ packer_pipeline { | |||
| partition = "aws-us-gov" // AWS partition (aws or aws-us-gov) | |||
|
|
|||
| // Role management | |||
| create_role = true // Enable automatic role creation | |||
| create_role = false // Role already exists; provide ARN directly | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we always want to create the role?
deploy/main.tf
Outdated
| # PAT used by CodeBuild/Terraform for the GitHub provider (must be a ghp_ PAT — | ||
| # the standard App installation token ghs_ cannot access /api/v3/user which is | ||
| # required by the CSVD terraform-github-repo module). | ||
| TF_GITHUB_TOKEN_SECRET_NAME = "ghe-runner/github-token" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass in token secret name
deploy/main.tf
Outdated
| Sid = "ReadTFGitHubToken" | ||
| Effect = "Allow" | ||
| Action = ["secretsmanager:GetSecretValue"] | ||
| Resource = "arn:aws-us-gov:secretsmanager:${var.aws_region}:${data.aws_caller_identity.current.account_id}:secret:ghe-runner/github-token-*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look up the partition value with data.aws_caller_identity.current.partition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also pass in the secret name
deploy/main.tf
Outdated
| # ── VPC endpoint: CodeBuild (interface) ────────────────────────────────────── | ||
| # The Lambda runs inside a VPC; without this endpoint the CodeBuild API call | ||
| # times out because there is no NAT/internet path for codebuild.amazonaws.com. | ||
| resource "aws_vpc_endpoint" "codebuild" { | ||
| count = var.enable_vpc ? 1 : 0 | ||
|
|
||
| vpc_id = var.codebuild_vpc_id | ||
| service_name = "com.amazonaws.${var.aws_region}.codebuild" | ||
| vpc_endpoint_type = "Interface" | ||
| subnet_ids = var.subnet_ids | ||
| security_group_ids = var.security_group_ids | ||
| private_dns_enabled = true | ||
|
|
||
| tags = merge(var.tags, { Name = "eks-terragrunt-codebuild-endpoint" }) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't create VPC endpoints, they should already be in the account we use.
deploy/terraform.tfvars
Outdated
| subnet_ids = ["subnet-0b1992a84536c581b"] | ||
| security_group_ids = ["sg-0641c697588b9aa6b"] | ||
| codebuild_vpc_id = "vpc-00576a396ec570b94" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be looked up or created
deploy/terraform.tfvars
Outdated
| # CodeBuild project triggered by the Lambda to run terraform-eks-deployment. | ||
| # Reuses the existing CodeBuild packer role (has S3 + VPC + CloudWatch perms). | ||
| codebuild_project_name = "eks-terragrunt-repo-creator" | ||
| codebuild_role_arn = "arn:aws-us-gov:iam::229685449397:role/CodeBuildPackerRole-eks-terragrunt-repo-generator-builder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look up by name or create?
deploy/variables.tf
Outdated
| variable "codebuild_role_arn" { | ||
| description = "IAM role ARN for the CodeBuild repo-creator project" | ||
| type = string | ||
| default = "arn:aws-us-gov:iam::229685449397:role/CodeBuildPackerRole-eks-terragrunt-repo-generator-builder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be looked up so it can work across accounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
| @@ -189,7 +191,7 @@ Resources: | |||
| RepositoryCreator: | |||
| Type: Custom::GitHubRepository | |||
| Properties: | |||
| ServiceToken: !Ref LambdaFunctionArn | |||
| ServiceToken: !Sub "arn:aws-us-gov:lambda:${AWS::Region}:${AWS::AccountId}:function:eks-terragrunt-repo-gen-template-automation" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look up partition
template_automation/app.py
Outdated
| @@ -43,6 +43,7 @@ class CloudFormationResourceInput(BaseModel): | |||
| """Input validation model for CloudFormation Custom Resource parameters.""" | |||
| project_name: str = Field(..., description="Name for the new repository") | |||
| owning_team: Optional[str] = Field(default="tf-module-admins", description="Team that should own the repository") | |||
| creator_username: Optional[str] = Field(default=None, description="GitHub username of the person provisioning this repo; will be granted admin access") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had abandoned the path for creating the git repo with python? Do we still need this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct — the Python repo-creation path is gone. app.py has been rewritten to ~380 lines: it only parses the CFN event, validates the required EKS fields, starts the eks-terragrunt-repo-creator CodeBuild project, polls until completion, then fetches the open PR URL from the GitHub API and returns it in the CFN response. All actual repo creation runs in CodeBuild via terraform-eks-deployment.
| @@ -10,14 +10,15 @@ | |||
|
|
|||
|
|
|||
| def _default_visibility() -> str: | |||
| """Read REPO_VISIBILITY env var, defaulting to 'internal'. | |||
| """Read REPO_VISIBILITY env var, defaulting to 'public'. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still used since we aren't using the python path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call — removing the entire generic code path (GitHub/GitLab providers, template manager, models, config.json rendering). This file and github_provider.py, gitlab_provider.py, github_client.py, gitlab_client.py, template_manager.py, and models.py have all been deleted. app.py is now ~380 lines, EKS + CodeBuild only.
- Rewrite app.py to ~380 lines (EKS + CodeBuild only); remove all generic
GitHub/GitLab/template-manager code paths
- Delete 7 dead modules: repository_provider, github_provider, gitlab_provider,
github_client, gitlab_client, template_manager, models
- Delete eks_config.py (Jinja2 rendering now handled entirely by terraform-eks-
deployment in CodeBuild)
- Rewrite tests/test_app.py for EKS-only handler; drop stale test files
- Simplify requirements.txt (remove requests/requests-mock; now using urllib)
deploy/ Terraform:
- Add conditional CodeBuild IAM role (create_codebuild_role var); default=true
creates a minimal role with scoped logs/S3/VPC policies; false looks up a
pre-existing role by name
- Add data sources for subnet + security group lookup by name (no hardcoded IDs)
- Replace hardcoded 'ghe-runner/github-token' with var.tf_github_token_secret_name
- Fix IAM policy partition: arn:aws-us-gov: -> arn:${partition}:
- Remove aws_vpc_endpoint.codebuild (VPC endpoints already exist in the account)
- Remove codebuild_role_arn / codebuild_vpc_id / subnet_ids / security_group_ids
variables; add codebuild_role_name, subnet_name, security_group_name,
tf_github_token_secret_name, codebuild_assets_bucket
service-catalog/product-template.yaml:
- Fix ServiceToken partition: arn:aws-us-gov:lambda: -> arn:${AWS::Partition}:lambda:
csvd_config_packer.hcl:
- Annotate account-specific fields with (*) comments explaining they cannot be
dynamically resolved by packer-pipeline HCL
- Rewrite architecture section to explicitly document cross-account flow: the SC product is shared to multiple accounts but all compute (Lambda, CodeBuild, Secrets Manager) runs centrally in csvd-dev (229685449397) - Add cross-account ASCII diagram showing provisioner account vs csvd-dev - Remove stale reference to eks_config.py (deleted in prior commit) - Remove stale generic-mode fallback description (Lambda is EKS-only) - Add EKS_SC_RESOURCE_INVENTORY.md: full catalog of resources by owner (census Terragrunt, direct Terraform, StackSet, pre-existing, ephemeral) with cross-account architecture section and StackSet note on launch role
…branch The Lambda is EKS-only by design — the is_eks_deployment false-branch only ever raised a ValueError with no alternative path. Remove it in favour of Pydantic required fields, which surface missing fields with precise error messages at validation time. Required fields (was Optional): cluster_name, environment, aws_region, account_name, aws_account_id, environment_abbr, vpc_name, vpc_domain_name. Also simplifies start_codebuild_build() — 'or ""' fallbacks on required fields are no longer needed.
…ters Both values are now resolved automatically by CloudFormation: This ensures the provisioner's own account ID and region are used, not hardcoded values or user-supplied inputs that could mismatch. Removes two parameters from the form users see when provisioning.
…M policies Adds deploy/buildspec-eks-repo-creator.yml — the CodeBuild buildspec for the eks-terragrunt-repo-creator project, versioned alongside the Terraform that manages it rather than referencing terraform-eks-deployment. Currently points at REPO_BRANCH=test_cluster (PR #16 under review) — must be updated to 'main' once PR #16 merges. Also fixes all IAM policy ARN constructions to use data.aws_partition.current instead of data.aws_caller_identity.current (caller_identity has no .partition attribute in this provider version).
- copilot-instructions: simplify rebuild steps to use packer-pipeline; add Python/CLI standards section; add AWS_DEFAULT_REGION reminder; add 'do not re-add AwsRegion/AWSAccountId' to What NOT to Do - DEPLOYMENT.md, CLOUDFORMATION_CUSTOM_RESOURCE_MIGRATION.md: fix packer-pipeline invocation to use csvd_config_packer.hcl - csvd_config_packer.hcl: remove duplicate stale comment block at EOF - DEMO_SCRIPT.md: update param table and step 3 walkthrough to match the corrected SC form (no AwsRegion/AWSAccountId, correct field names) - test_service_catalog.py: bump artifact to v2.1; remove AwsRegion and - docs/SC-TEMPLATE-FIX-PLAN.md: add completed fix plan for reference - deploy state: updated after today's tf apply
Summary
Cleans up the Lambda function and Service Catalog product template following the
EKS-only architecture decision, verified with a passing E2E test today.
Changes
template_automation/app.py— EKS fields now requiredis_eks_deploymentproperty and its guard inlambda_handler— the Lambda is EKS-only; the false-branch only raised aValueErrorwith no alternative pathcluster_name,environment,aws_region,account_name,aws_account_id,environment_abbr,vpc_name,vpc_domain_name) are nowstr = Field(...)— Pydantic surfaces missing fields with precise per-field errors at invocation timeor ""fallbacks instart_codebuild_build()— not needed for required fieldsservice-catalog/product-template.yaml— remove user-facing AwsRegion/AWSAccountIdAwsRegionandAWSAccountIddropped as Parametersaws_region: !Sub "${AWS::Region}"andaws_account_id: !Sub "${AWS::AccountId}"deploy/buildspec-eks-repo-creator.yml— new fileeks-terragrunt-repo-creator, versioned here alongside the Terraform that manages itREPO_BRANCH: test_clusterwhile PR #16 is in review — must be updated tomainafter PR #16 mergesdeploy/main.tf— IAM partition fix + buildspec path updatedata.aws_partition.current.partition(caller_identityhas no.partitionattribute)file()reference to point to the local buildspecDocs / scripts / config
copilot-instructions.md: simplified rebuild steps; added Python/CLI standards;AWS_DEFAULT_REGIONreminder; new What NOT to Do entriesDEPLOYMENT.md,CLOUDFORMATION_CUSTOM_RESOURCE_MIGRATION.md: corrected packer-pipeline invocation tocsvd_config_packer.hclcsvd_config_packer.hcl: removed duplicate stale comment block at EOFdocs/DEMO_SCRIPT.md: updated param table and step 3 walkthrough to match corrected SC formscripts/test_service_catalog.py: bump artifact to v2.1; remove AwsRegion/AWSAccountId from DEFAULT_PARAMSdocs/SC-TEMPLATE-FIX-PLAN.md: completed fix plan added for referenceTesting
Full end-to-end Service Catalog test passed:
sc-e2e-test-20260421-1604→ repo created, 1 open PR, correct structureFollow-up required (after PR #16 merges in terraform-eks-deployment)
deploy/buildspec-eks-repo-creator.ymlREPO_BRANCHfromtest_cluster→maintf applyindeploy/to push the buildspec change to CodeBuildterraform-service-catalog-census/templates/products/eks-terragrunt-repo/2-0-0.yamlwith the SC template changes here