diff --git a/design-docs/SERVICE_CATALOG_CENSUS_INTEGRATION.md b/design-docs/SERVICE_CATALOG_CENSUS_INTEGRATION.md new file mode 100644 index 0000000..5f38cc6 --- /dev/null +++ b/design-docs/SERVICE_CATALOG_CENSUS_INTEGRATION.md @@ -0,0 +1,373 @@ +# Integration Plan: EKS Repo Generator → terraform-service-catalog-census + +**Date:** February 20, 2026 +**Author:** David Arnold +**Status:** DRAFT + +--- + +## Executive Summary + +This document outlines the plan to integrate the EKS Terragrunt Repository Generator product into the centralized `terraform-service-catalog-census` repo, which manages all enterprise Service Catalog portfolios and products via Terragrunt. Currently, our Service Catalog product (portfolio, product, launch constraints) is managed independently in `lambda-template-repo-generator/deploy/`. Integrating into the census repo aligns us with the enterprise pattern and enables org-wide sharing of the product. + +--- + +## 1. Current Architecture (As-Is) + +### Our System (4 repos) + +``` +lambda-template-repo-generator/ ← Lambda code + SC product (THIS REPO) +├── template_automation/ ← Lambda handler, EKS renderer, GitHub API +├── service-catalog/product-template.yaml ← CFN product template +├── deploy/ ← Terraform: Lambda + IAM + SC portfolio/product +│ ├── main.tf ← Lambda, ECR, API GW, SSM +│ ├── service_catalog.tf ← Portfolio, product, launch constraint +│ └── terraform.tfvars ← Account-specific config +└── scripts/ ← Test tools + +terraform-aws-template-automation/ ← Reusable Terraform module +packer-pipeline/ ← Container build CLI +template-eks-cluster/ ← Template repo (cloned into new repos) +``` + +**Key resources we currently deploy in `deploy/service_catalog.tf`:** +- `aws_servicecatalog_portfolio` +- `aws_servicecatalog_product` + `aws_servicecatalog_provisioning_artifact` +- `aws_servicecatalog_product_portfolio_association` +- `aws_servicecatalog_principal_portfolio_association` +- `aws_servicecatalog_constraint` (LAUNCH + TEMPLATE) +- `aws_s3_object` (product template upload) + +**Deployed to:** `csvd-dev-gov` (229685449397) / `us-gov-west-1` + +### Census System (terraform-service-catalog-census) + +``` +terraform-service-catalog-census/ +├── root.hcl ← Terragrunt root config +├── _envcommon/service-catalog.hcl ← Common component config +├── main-module/service-catalog/ ← Main Terraform module +│ ├── main.tf ← Portfolio + product orchestration +│ └── variables.tf ← Inputs (configurations_dir, product_dir, etc.) +├── modules/ +│ ├── sc-portfolio/ ← Portfolio creation + principal association +│ ├── sc-product/ ← Product creation + S3 upload + versioning +│ └── cfn-roles-actions/ ← Launch roles via CFN StackSets +├── templates/ +│ ├── products/ ← CFN product templates (versioned YAMLs) +│ │ ├── ec2-instance-linux/ ← 1-0-0.yaml, 1-0-1.yaml, 1-1-0.yaml, ... +│ │ ├── ec2-instance-win/ +│ │ ├── ec2-storage/ +│ │ ├── image-pipeline/ +│ │ ├── rds-postgres/ +│ │ └── s3-bucket/ +│ └── role-templates/ ← IAM launch role CFN snippets +├── non-prod/ +│ ├── csvd-dev/west/ +│ │ ├── configurations/ +│ │ │ ├── portfolios/*.yaml.tftpl ← Portfolio definitions (YAML) +│ │ │ └── products/**/*.yaml.tftpl ← Product definitions (YAML) +│ │ └── service-catalog/ +│ │ ├── terragrunt.hcl ← Module source + inputs +│ │ └── terraform.tfvars ← Account config +│ ├── lab-dev/ +│ └── lab-operations/ +└── prod/operations-gov/ ← Prod (shares to org) +``` + +**How it works:** +1. **Portfolios** are defined in YAML files under `//configurations/portfolios/` +2. **Products** are defined in YAML files under `//configurations/products//` +3. **Product templates** (CFN YAMLs) live in `templates/products//.yaml` +4. **Launch roles** are deployed via CFN StackSets (shared across the org) +5. The `sc-product` module creates an S3 bucket, uploads templates, creates products, and creates provisioning artifacts +6. The `main-module/service-catalog/main.tf` wires portfolios → products → constraints + +**Key difference from our approach:** +- Census uses **YAML-driven configuration** (portfolios + products defined in YAML, not HCL) +- Products are **versioned** (multiple YAML files per product: `1-0-0.yaml`, `1-1-0.yaml`, etc.) +- Launch roles are created via **CFN StackSets** (shared across the org), not per-account Terraform +- Product templates are **static files** loaded from disk (not `templatefile()` rendered) +- The `sc-product` module creates its **own S3 bucket** for artifacts (not a shared one) + +--- + +## 2. Integration Approach: Product-Only (Recommended) + +### Strategy + +Move only the **Service Catalog product definition** (portfolio, product, constraints) into `terraform-service-catalog-census`. Keep the **Lambda infrastructure** (Lambda function, IAM execution role, ECR, VPC config, SSM parameters) in `lambda-template-repo-generator/deploy/`. + +### Why Product-Only? + +| Concern | Product-Only ✅ | Full Migration ❌ | +|---------|----------------|-------------------| +| Separation of concerns | SC catalog management separate from Lambda code | Everything in one monolith | +| Deploy independence | Lambda deploys independently of SC product catalog | Coupled deploy cycles | +| Census repo pattern | Matches existing products (ec2, rds, s3, etc.) | Would be an outlier requiring Lambda + ECR modules | +| Launch role | Already exists in Lambda's deploy/; referenced by name | Would need CFN StackSet role template | +| Org-wide sharing | Prod operations-gov deployment shares to org | Already shared per-account | +| Risk | Low — just moving YAML + config | High — must migrate Terraform state for Lambda, IAM, etc. | + +### What Moves vs. What Stays + +| Component | Current Location | After Integration | +|-----------|-----------------|-------------------| +| **Product template YAML** | `service-catalog/product-template.yaml` | `terraform-service-catalog-census/templates/products/eks-terragrunt-repo/2-0-0.yaml` | +| **Portfolio definition** | `deploy/service_catalog.tf` (HCL) | `/configurations/portfolios/eks-terragrunt.yaml.tftpl` (YAML) | +| **Product definition** | `deploy/service_catalog.tf` (HCL) | `/configurations/products/eks-terragrunt-repo/EKS_REPO.yaml.tftpl` (YAML) | +| **Launch constraint** | `deploy/service_catalog.tf` (HCL) | Product YAML `launch_role` field | +| **Template constraint** | `deploy/service_catalog.tf` (HCL) | Product YAML `template_constraints` field | +| Lambda function | `deploy/main.tf` | **STAYS** in `lambda-template-repo-generator/deploy/` | +| IAM execution role | `deploy/main.tf` | **STAYS** | +| ECR repo + image | `deploy/main.tf` + packer-pipeline | **STAYS** | +| SSM parameters | `deploy/main.tf` | **STAYS** | +| VPC configuration | `deploy/main.tf` | **STAYS** | +| SC launch role | `deploy/service_catalog.tf` | **STAYS** (referenced by name in product YAML) | + +--- + +## 3. Integration Steps + +### Phase 1: Prepare Product Template for Census Format + +The census repo expects product templates as **static YAML files** in `templates/products//.yaml`. Our current template uses `LambdaFunctionArn` as a parameter with a Terraform-injected default. For census integration, we need the `LambdaFunctionArn` locked via a **template constraint** instead. + +**Action:** Create `templates/products/eks-terragrunt-repo/2-0-0.yaml` +- Copy from `service-catalog/product-template.yaml` +- Remove the `Default` value from `LambdaFunctionArn` parameter (will be locked by template constraint) +- Add `Metadata.ServiceCatalog.ProductVersion.Description` for census versioning support + +### Phase 2: Create Portfolio YAML (per account/region) + +Create portfolio YAML files for each account/region where the product should be available. + +**File:** `non-prod/csvd-dev/west/configurations/portfolios/eks-terragrunt.yaml.tftpl` + +```yaml +eks-terragrunt-portfolio: + name: EKS Terragrunt Repository Creator Portfolio + description: Self-service EKS cluster repository creation with Terragrunt configuration + provider_name: Platform Engineering - CSVD + products: + - eks_terragrunt_repo + user_roles: + - /aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_inf-admin-t2_* + - /aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_AdministratorAccess_* + - /aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_AWSAdministratorAccess_* + tags: + associated_tag_options: + share_ous: +``` + +### Phase 3: Create Product YAML (per account/region) + +**File:** `non-prod/csvd-dev/west/configurations/products/eks-terragrunt-repo/EKS_REPO.yaml.tftpl` + +```yaml +eks_terragrunt_repo: + name: eks-terragrunt-eks-repo-creator + description: >- + Creates a GitHub repository from the template-eks-cluster template with + fully-rendered Terragrunt/HCL configuration for EKS cluster deployments. + type: CLOUD_FORMATION_TEMPLATE + launch_role: eks-terragrunt-sc-launch-role + distributor: CSVD - Platform Engineering + template_constraints: + Parameters: + LambdaFunctionArn: >- + arn:aws-us-gov:lambda:${aws_region}:229685449397:function:eks-terragrunt-repo-gen-template-automation + versions: + - name: 2.0.0 + file_path: /eks-terragrunt-repo/2-0-0.yaml +``` + +### Phase 4: Handle the Launch Role + +The census system creates launch roles via **CFN StackSets** (see `templates/role-templates/`). Our launch role (`eks-terragrunt-sc-launch-role`) is currently created by our own Terraform in `deploy/service_catalog.tf`. + +**Two options:** + +#### Option A: Reference Existing Role (Simpler — Recommended for Phase 1) +Keep the launch role in our Terraform deployment. The product YAML's `launch_role` field references the role **by name** (not ARN), so it will work as long as the role exists in the target account. + +**Prerequisite:** Deploy `lambda-template-repo-generator/deploy/` first to create the role in each account. + +#### Option B: Create Census Role Template (Long-term) +Create a new role template at `templates/role-templates/eks-terragrunt-launch-role.yaml` and add it to the roles configuration. This would let the census StackSet create the role across all accounts automatically. + +**Complexity:** The launch role needs permissions for Lambda invoke, CloudFormation responses, S3 read (with tag condition), and CloudWatch Logs — all of which reference specific ARNs. This is better suited for Phase 2 when scaling to more accounts. + +### Phase 5: Remove SC Resources from Lambda Deploy + +After the census integration is deployed and verified: + +1. Remove from `deploy/service_catalog.tf`: + - `aws_servicecatalog_portfolio` + - `aws_servicecatalog_product` + - `aws_servicecatalog_provisioning_artifact` + - `aws_servicecatalog_product_portfolio_association` + - `aws_servicecatalog_principal_portfolio_association` + - `aws_servicecatalog_constraint` (both LAUNCH and TEMPLATE) + - `aws_s3_object.product_template` +2. Remove from `deploy/variables.tf`: + - `create_service_catalog` variable + - `service_catalog_config` variable +3. Remove from `deploy/terraform.tfvars`: + - `create_service_catalog` and `service_catalog_config` blocks +4. Run `terraform state rm` for each resource before removing code (to avoid destroy) +5. Keep the **launch role** resource (Option A) or remove it too (Option B) + +### Phase 6: Multi-Account Expansion + +Once validated in `csvd-dev/west`, add the product to other environments: + +``` +non-prod/csvd-dev/west/configurations/ ← Phase 1 (current account) +prod/operations-gov/west/configurations/ ← Phase 2 (org-wide via sharing) +non-prod/lab-dev/east/configurations/ ← Phase 3 (lab environments) +``` + +The `prod/operations-gov` deployment uses `to_share_portfolios = true` with OU sharing, which would make the product available to all workload accounts in the org. + +--- + +## 4. Prerequisites & Dependencies + +### Before integration, the following must exist in each target account: + +| Prerequisite | How Created | Current State | +|-------------|-------------|---------------| +| Lambda function | `lambda-template-repo-generator/deploy/` | ✅ Deployed in csvd-dev-gov | +| Lambda execution role | `lambda-template-repo-generator/deploy/` | ✅ Deployed | +| ECR image | `packer-pipeline` → CodeBuild | ✅ Built and pushed | +| SC launch role | `lambda-template-repo-generator/deploy/` | ✅ `eks-terragrunt-sc-launch-role` exists | +| GitHub token in Secrets Manager | Manual / separate deploy | ✅ Exists at `/eks-cluster-deployment/github_token` | +| SSM parameters | `terraform-aws-template-automation` | ✅ Deployed | +| VPC/subnet access to GHE | Network team | ✅ Configured | + +### Dependency order for new account deployment: +``` +1. terraform-aws-template-automation (SSM params) +2. packer-pipeline build (Container image) +3. lambda-template-repo-generator (Lambda + IAM + ECR) ← keeps launch role +4. terraform-service-catalog-census (SC portfolio + product) ← NEW +``` + +--- + +## 5. Risks & Mitigations + +| Risk | Impact | Mitigation | +|------|--------|------------| +| Template constraint with hardcoded Lambda ARN | Breaks if Lambda ARN changes | ARN is deterministic (function name is fixed in Terraform) | +| Launch role doesn't exist in shared accounts | Product launch fails | Phase 1 uses Option A (role stays in Lambda deploy); Phase 2 adds CFN StackSet role | +| S3 tag-based SCP blocks template access | Template 403 during launch | Census `sc-product` module adds its own S3 bucket with proper tags | +| Duplicate portfolio/product during migration | Users see two products | Run `terraform state rm` on old resources before census deploy | +| Census repo uses different S3 bucket | Old provisioned products reference old S3 URL | Existing provisioned products are unaffected; new launches use new bucket | + +--- + +## 6. Files to Create in terraform-service-catalog-census + +``` +terraform-service-catalog-census/ +├── templates/products/eks-terragrunt-repo/ +│ └── 2-0-0.yaml ← Product CFN template +├── non-prod/csvd-dev/west/configurations/ +│ ├── portfolios/eks-terragrunt.yaml.tftpl ← Portfolio definition +│ └── products/eks-terragrunt-repo/ +│ └── EKS_REPO.yaml.tftpl ← Product definition +└── (future) templates/role-templates/ + └── eks-terragrunt-launch-role.yaml ← Launch role (Phase 2) +``` + +**Total: 3 new files** (Phase 1), 1 additional file (Phase 2) + +--- + +## 7. Files to Modify in lambda-template-repo-generator + +After census integration is live and validated: + +| File | Change | +|------|--------| +| `deploy/service_catalog.tf` | Remove all SC resources (portfolio, product, constraints, S3 object) | +| `deploy/variables.tf` | Remove `create_service_catalog` and `service_catalog_config` variables | +| `deploy/terraform.tfvars` | Remove `create_service_catalog` and `service_catalog_config` blocks | +| `deploy/main.tf` | Remove SC-related outputs | +| Documentation (.md files) | Update deployment instructions to reference census repo | + +**Keep:** Lambda, IAM execution role, SC launch role (until Phase 2), ECR, VPC config, SSM params + +--- + +## 8. Validation Checklist + +- [ ] Product template uploaded to census-managed S3 bucket +- [ ] Portfolio visible in Service Catalog console +- [ ] Product associated with portfolio +- [ ] Launch constraint attached (references `eks-terragrunt-sc-launch-role`) +- [ ] Template constraint locks `LambdaFunctionArn` parameter +- [ ] `scripts/test_service_catalog.py` passes against census-deployed product +- [ ] Old SC resources removed from `lambda-template-repo-generator` Terraform state +- [ ] No duplicate portfolios/products in console + +--- + +## 9. Timeline Estimate + +| Phase | Work | Duration | +|-------|------|----------| +| Phase 1 | Create 3 files in census repo, test in csvd-dev | 1 day | +| Phase 2 | Remove old SC resources from Lambda deploy | 0.5 day | +| Phase 3 | Add to prod/operations-gov for org sharing | 0.5 day | +| Phase 4 (optional) | Create CFN StackSet launch role template | 1 day | +| **Total** | | **2-3 days** | + +--- + +## Appendix A: Census Config Format Reference + +### Portfolio YAML Schema +```yaml +: + name: string # Display name + description: string # Description + provider_name: string # Provider name shown in console + products: # List of product keys to associate + - + user_roles: # IAM role ARN patterns for principal access + - /path/pattern/* + tags: {} + associated_tag_options: {} + share_ous: [] # OU names for cross-account sharing +``` + +### Product YAML Schema +```yaml +: + name: string # Product name in SC console + description: string # Product description + type: CLOUD_FORMATION_TEMPLATE # or EXTERNAL + launch_role: string # IAM role NAME (not ARN) for launch constraint + distributor: string # Shown in console + template_constraints: # Parameter constraints + Parameters: + ParamName: locked-value + rules: # CFN rules (validation) + RuleName: + Assertions: [...] + versions: # Product versions + - name: "2.0.0" + file_path: /product-dir/version.yaml + actions: [] # Service actions (optional) +``` + +### Product Template Location +Templates are static CFN YAML files at: +``` +templates/products//.yaml +``` +Uploaded to S3 by the `sc-product` module. The module creates its own S3 bucket with prefix specified in `terraform.tfvars`.