Skip to content

Commit

Permalink
Add integration plan: EKS repo generator → terraform-service-catalog-…
Browse files Browse the repository at this point in the history
…census

Documents the approach to migrate the Service Catalog product definition
(portfolio, product, constraints) into the centralized census repo while
keeping the Lambda infrastructure in this repo. Includes:
- As-is vs to-be architecture comparison
- Product-only integration strategy (recommended)
- Step-by-step migration phases
- Census YAML format reference
- Prerequisites, risks, and validation checklist
  • Loading branch information
Your Name committed Feb 20, 2026
1 parent f49782f commit 50d963b
Showing 1 changed file with 373 additions and 0 deletions.
373 changes: 373 additions & 0 deletions design-docs/SERVICE_CATALOG_CENSUS_INTEGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,373 @@
# Integration Plan: EKS Repo Generator → terraform-service-catalog-census

**Date:** February 20, 2026
**Author:** David Arnold
**Status:** DRAFT

---

## Executive Summary

This document outlines the plan to integrate the EKS Terragrunt Repository Generator product into the centralized `terraform-service-catalog-census` repo, which manages all enterprise Service Catalog portfolios and products via Terragrunt. Currently, our Service Catalog product (portfolio, product, launch constraints) is managed independently in `lambda-template-repo-generator/deploy/`. Integrating into the census repo aligns us with the enterprise pattern and enables org-wide sharing of the product.

---

## 1. Current Architecture (As-Is)

### Our System (4 repos)

```
lambda-template-repo-generator/ ← Lambda code + SC product (THIS REPO)
├── template_automation/ ← Lambda handler, EKS renderer, GitHub API
├── service-catalog/product-template.yaml ← CFN product template
├── deploy/ ← Terraform: Lambda + IAM + SC portfolio/product
│ ├── main.tf ← Lambda, ECR, API GW, SSM
│ ├── service_catalog.tf ← Portfolio, product, launch constraint
│ └── terraform.tfvars ← Account-specific config
└── scripts/ ← Test tools
terraform-aws-template-automation/ ← Reusable Terraform module
packer-pipeline/ ← Container build CLI
template-eks-cluster/ ← Template repo (cloned into new repos)
```

**Key resources we currently deploy in `deploy/service_catalog.tf`:**
- `aws_servicecatalog_portfolio`
- `aws_servicecatalog_product` + `aws_servicecatalog_provisioning_artifact`
- `aws_servicecatalog_product_portfolio_association`
- `aws_servicecatalog_principal_portfolio_association`
- `aws_servicecatalog_constraint` (LAUNCH + TEMPLATE)
- `aws_s3_object` (product template upload)

**Deployed to:** `csvd-dev-gov` (229685449397) / `us-gov-west-1`

### Census System (terraform-service-catalog-census)

```
terraform-service-catalog-census/
├── root.hcl ← Terragrunt root config
├── _envcommon/service-catalog.hcl ← Common component config
├── main-module/service-catalog/ ← Main Terraform module
│ ├── main.tf ← Portfolio + product orchestration
│ └── variables.tf ← Inputs (configurations_dir, product_dir, etc.)
├── modules/
│ ├── sc-portfolio/ ← Portfolio creation + principal association
│ ├── sc-product/ ← Product creation + S3 upload + versioning
│ └── cfn-roles-actions/ ← Launch roles via CFN StackSets
├── templates/
│ ├── products/ ← CFN product templates (versioned YAMLs)
│ │ ├── ec2-instance-linux/ ← 1-0-0.yaml, 1-0-1.yaml, 1-1-0.yaml, ...
│ │ ├── ec2-instance-win/
│ │ ├── ec2-storage/
│ │ ├── image-pipeline/
│ │ ├── rds-postgres/
│ │ └── s3-bucket/
│ └── role-templates/ ← IAM launch role CFN snippets
├── non-prod/
│ ├── csvd-dev/west/
│ │ ├── configurations/
│ │ │ ├── portfolios/*.yaml.tftpl ← Portfolio definitions (YAML)
│ │ │ └── products/**/*.yaml.tftpl ← Product definitions (YAML)
│ │ └── service-catalog/
│ │ ├── terragrunt.hcl ← Module source + inputs
│ │ └── terraform.tfvars ← Account config
│ ├── lab-dev/
│ └── lab-operations/
└── prod/operations-gov/ ← Prod (shares to org)
```

**How it works:**
1. **Portfolios** are defined in YAML files under `<account>/<region>/configurations/portfolios/`
2. **Products** are defined in YAML files under `<account>/<region>/configurations/products/<name>/`
3. **Product templates** (CFN YAMLs) live in `templates/products/<product-name>/<version>.yaml`
4. **Launch roles** are deployed via CFN StackSets (shared across the org)
5. The `sc-product` module creates an S3 bucket, uploads templates, creates products, and creates provisioning artifacts
6. The `main-module/service-catalog/main.tf` wires portfolios → products → constraints

**Key difference from our approach:**
- Census uses **YAML-driven configuration** (portfolios + products defined in YAML, not HCL)
- Products are **versioned** (multiple YAML files per product: `1-0-0.yaml`, `1-1-0.yaml`, etc.)
- Launch roles are created via **CFN StackSets** (shared across the org), not per-account Terraform
- Product templates are **static files** loaded from disk (not `templatefile()` rendered)
- The `sc-product` module creates its **own S3 bucket** for artifacts (not a shared one)

---

## 2. Integration Approach: Product-Only (Recommended)

### Strategy

Move only the **Service Catalog product definition** (portfolio, product, constraints) into `terraform-service-catalog-census`. Keep the **Lambda infrastructure** (Lambda function, IAM execution role, ECR, VPC config, SSM parameters) in `lambda-template-repo-generator/deploy/`.

### Why Product-Only?

| Concern | Product-Only ✅ | Full Migration ❌ |
|---------|----------------|-------------------|
| Separation of concerns | SC catalog management separate from Lambda code | Everything in one monolith |
| Deploy independence | Lambda deploys independently of SC product catalog | Coupled deploy cycles |
| Census repo pattern | Matches existing products (ec2, rds, s3, etc.) | Would be an outlier requiring Lambda + ECR modules |
| Launch role | Already exists in Lambda's deploy/; referenced by name | Would need CFN StackSet role template |
| Org-wide sharing | Prod operations-gov deployment shares to org | Already shared per-account |
| Risk | Low — just moving YAML + config | High — must migrate Terraform state for Lambda, IAM, etc. |

### What Moves vs. What Stays

| Component | Current Location | After Integration |
|-----------|-----------------|-------------------|
| **Product template YAML** | `service-catalog/product-template.yaml` | `terraform-service-catalog-census/templates/products/eks-terragrunt-repo/2-0-0.yaml` |
| **Portfolio definition** | `deploy/service_catalog.tf` (HCL) | `<account>/configurations/portfolios/eks-terragrunt.yaml.tftpl` (YAML) |
| **Product definition** | `deploy/service_catalog.tf` (HCL) | `<account>/configurations/products/eks-terragrunt-repo/EKS_REPO.yaml.tftpl` (YAML) |
| **Launch constraint** | `deploy/service_catalog.tf` (HCL) | Product YAML `launch_role` field |
| **Template constraint** | `deploy/service_catalog.tf` (HCL) | Product YAML `template_constraints` field |
| Lambda function | `deploy/main.tf` | **STAYS** in `lambda-template-repo-generator/deploy/` |
| IAM execution role | `deploy/main.tf` | **STAYS** |
| ECR repo + image | `deploy/main.tf` + packer-pipeline | **STAYS** |
| SSM parameters | `deploy/main.tf` | **STAYS** |
| VPC configuration | `deploy/main.tf` | **STAYS** |
| SC launch role | `deploy/service_catalog.tf` | **STAYS** (referenced by name in product YAML) |

---

## 3. Integration Steps

### Phase 1: Prepare Product Template for Census Format

The census repo expects product templates as **static YAML files** in `templates/products/<name>/<version>.yaml`. Our current template uses `LambdaFunctionArn` as a parameter with a Terraform-injected default. For census integration, we need the `LambdaFunctionArn` locked via a **template constraint** instead.

**Action:** Create `templates/products/eks-terragrunt-repo/2-0-0.yaml`
- Copy from `service-catalog/product-template.yaml`
- Remove the `Default` value from `LambdaFunctionArn` parameter (will be locked by template constraint)
- Add `Metadata.ServiceCatalog.ProductVersion.Description` for census versioning support

### Phase 2: Create Portfolio YAML (per account/region)

Create portfolio YAML files for each account/region where the product should be available.

**File:** `non-prod/csvd-dev/west/configurations/portfolios/eks-terragrunt.yaml.tftpl`

```yaml
eks-terragrunt-portfolio:
name: EKS Terragrunt Repository Creator Portfolio
description: Self-service EKS cluster repository creation with Terragrunt configuration
provider_name: Platform Engineering - CSVD
products:
- eks_terragrunt_repo
user_roles:
- /aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_inf-admin-t2_*
- /aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_AdministratorAccess_*
- /aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_AWSAdministratorAccess_*
tags:
associated_tag_options:
share_ous:
```
### Phase 3: Create Product YAML (per account/region)
**File:** `non-prod/csvd-dev/west/configurations/products/eks-terragrunt-repo/EKS_REPO.yaml.tftpl`

```yaml
eks_terragrunt_repo:
name: eks-terragrunt-eks-repo-creator
description: >-
Creates a GitHub repository from the template-eks-cluster template with
fully-rendered Terragrunt/HCL configuration for EKS cluster deployments.
type: CLOUD_FORMATION_TEMPLATE
launch_role: eks-terragrunt-sc-launch-role
distributor: CSVD - Platform Engineering
template_constraints:
Parameters:
LambdaFunctionArn: >-
arn:aws-us-gov:lambda:${aws_region}:229685449397:function:eks-terragrunt-repo-gen-template-automation
versions:
- name: 2.0.0
file_path: /eks-terragrunt-repo/2-0-0.yaml
```

### Phase 4: Handle the Launch Role

The census system creates launch roles via **CFN StackSets** (see `templates/role-templates/`). Our launch role (`eks-terragrunt-sc-launch-role`) is currently created by our own Terraform in `deploy/service_catalog.tf`.

**Two options:**

#### Option A: Reference Existing Role (Simpler — Recommended for Phase 1)
Keep the launch role in our Terraform deployment. The product YAML's `launch_role` field references the role **by name** (not ARN), so it will work as long as the role exists in the target account.

**Prerequisite:** Deploy `lambda-template-repo-generator/deploy/` first to create the role in each account.

#### Option B: Create Census Role Template (Long-term)
Create a new role template at `templates/role-templates/eks-terragrunt-launch-role.yaml` and add it to the roles configuration. This would let the census StackSet create the role across all accounts automatically.

**Complexity:** The launch role needs permissions for Lambda invoke, CloudFormation responses, S3 read (with tag condition), and CloudWatch Logs — all of which reference specific ARNs. This is better suited for Phase 2 when scaling to more accounts.

### Phase 5: Remove SC Resources from Lambda Deploy

After the census integration is deployed and verified:

1. Remove from `deploy/service_catalog.tf`:
- `aws_servicecatalog_portfolio`
- `aws_servicecatalog_product`
- `aws_servicecatalog_provisioning_artifact`
- `aws_servicecatalog_product_portfolio_association`
- `aws_servicecatalog_principal_portfolio_association`
- `aws_servicecatalog_constraint` (both LAUNCH and TEMPLATE)
- `aws_s3_object.product_template`
2. Remove from `deploy/variables.tf`:
- `create_service_catalog` variable
- `service_catalog_config` variable
3. Remove from `deploy/terraform.tfvars`:
- `create_service_catalog` and `service_catalog_config` blocks
4. Run `terraform state rm` for each resource before removing code (to avoid destroy)
5. Keep the **launch role** resource (Option A) or remove it too (Option B)

### Phase 6: Multi-Account Expansion

Once validated in `csvd-dev/west`, add the product to other environments:

```
non-prod/csvd-dev/west/configurations/ ← Phase 1 (current account)
prod/operations-gov/west/configurations/ ← Phase 2 (org-wide via sharing)
non-prod/lab-dev/east/configurations/ ← Phase 3 (lab environments)
```

The `prod/operations-gov` deployment uses `to_share_portfolios = true` with OU sharing, which would make the product available to all workload accounts in the org.

---

## 4. Prerequisites & Dependencies

### Before integration, the following must exist in each target account:

| Prerequisite | How Created | Current State |
|-------------|-------------|---------------|
| Lambda function | `lambda-template-repo-generator/deploy/` | ✅ Deployed in csvd-dev-gov |
| Lambda execution role | `lambda-template-repo-generator/deploy/` | ✅ Deployed |
| ECR image | `packer-pipeline` → CodeBuild | ✅ Built and pushed |
| SC launch role | `lambda-template-repo-generator/deploy/` | ✅ `eks-terragrunt-sc-launch-role` exists |
| GitHub token in Secrets Manager | Manual / separate deploy | ✅ Exists at `/eks-cluster-deployment/github_token` |
| SSM parameters | `terraform-aws-template-automation` | ✅ Deployed |
| VPC/subnet access to GHE | Network team | ✅ Configured |

### Dependency order for new account deployment:
```
1. terraform-aws-template-automation (SSM params)
2. packer-pipeline build (Container image)
3. lambda-template-repo-generator (Lambda + IAM + ECR) ← keeps launch role
4. terraform-service-catalog-census (SC portfolio + product) ← NEW
```
---
## 5. Risks & Mitigations
| Risk | Impact | Mitigation |
|------|--------|------------|
| Template constraint with hardcoded Lambda ARN | Breaks if Lambda ARN changes | ARN is deterministic (function name is fixed in Terraform) |
| Launch role doesn't exist in shared accounts | Product launch fails | Phase 1 uses Option A (role stays in Lambda deploy); Phase 2 adds CFN StackSet role |
| S3 tag-based SCP blocks template access | Template 403 during launch | Census `sc-product` module adds its own S3 bucket with proper tags |
| Duplicate portfolio/product during migration | Users see two products | Run `terraform state rm` on old resources before census deploy |
| Census repo uses different S3 bucket | Old provisioned products reference old S3 URL | Existing provisioned products are unaffected; new launches use new bucket |
---
## 6. Files to Create in terraform-service-catalog-census
```
terraform-service-catalog-census/
├── templates/products/eks-terragrunt-repo/
│ └── 2-0-0.yaml ← Product CFN template
├── non-prod/csvd-dev/west/configurations/
│ ├── portfolios/eks-terragrunt.yaml.tftpl ← Portfolio definition
│ └── products/eks-terragrunt-repo/
│ └── EKS_REPO.yaml.tftpl ← Product definition
└── (future) templates/role-templates/
└── eks-terragrunt-launch-role.yaml ← Launch role (Phase 2)
```
**Total: 3 new files** (Phase 1), 1 additional file (Phase 2)
---
## 7. Files to Modify in lambda-template-repo-generator
After census integration is live and validated:
| File | Change |
|------|--------|
| `deploy/service_catalog.tf` | Remove all SC resources (portfolio, product, constraints, S3 object) |
| `deploy/variables.tf` | Remove `create_service_catalog` and `service_catalog_config` variables |
| `deploy/terraform.tfvars` | Remove `create_service_catalog` and `service_catalog_config` blocks |
| `deploy/main.tf` | Remove SC-related outputs |
| Documentation (.md files) | Update deployment instructions to reference census repo |
**Keep:** Lambda, IAM execution role, SC launch role (until Phase 2), ECR, VPC config, SSM params
---
## 8. Validation Checklist
- [ ] Product template uploaded to census-managed S3 bucket
- [ ] Portfolio visible in Service Catalog console
- [ ] Product associated with portfolio
- [ ] Launch constraint attached (references `eks-terragrunt-sc-launch-role`)
- [ ] Template constraint locks `LambdaFunctionArn` parameter
- [ ] `scripts/test_service_catalog.py` passes against census-deployed product
- [ ] Old SC resources removed from `lambda-template-repo-generator` Terraform state
- [ ] No duplicate portfolios/products in console
---
## 9. Timeline Estimate
| Phase | Work | Duration |
|-------|------|----------|
| Phase 1 | Create 3 files in census repo, test in csvd-dev | 1 day |
| Phase 2 | Remove old SC resources from Lambda deploy | 0.5 day |
| Phase 3 | Add to prod/operations-gov for org sharing | 0.5 day |
| Phase 4 (optional) | Create CFN StackSet launch role template | 1 day |
| **Total** | | **2-3 days** |
---
## Appendix A: Census Config Format Reference
### Portfolio YAML Schema
```yaml
<portfolio_key>:
name: string # Display name
description: string # Description
provider_name: string # Provider name shown in console
products: # List of product keys to associate
- <product_key>
user_roles: # IAM role ARN patterns for principal access
- /path/pattern/*
tags: {}
associated_tag_options: {}
share_ous: [] # OU names for cross-account sharing
```

### Product YAML Schema
```yaml
<product_key>:
name: string # Product name in SC console
description: string # Product description
type: CLOUD_FORMATION_TEMPLATE # or EXTERNAL
launch_role: string # IAM role NAME (not ARN) for launch constraint
distributor: string # Shown in console
template_constraints: # Parameter constraints
Parameters:
ParamName: locked-value
rules: # CFN rules (validation)
RuleName:
Assertions: [...]
versions: # Product versions
- name: "2.0.0"
file_path: /product-dir/version.yaml
actions: [] # Service actions (optional)
```
### Product Template Location
Templates are static CFN YAML files at:
```
templates/products/<product-name>/<version>.yaml
```
Uploaded to S3 by the `sc-product` module. The module creates its own S3 bucket with prefix specified in `terraform.tfvars`.

0 comments on commit 50d963b

Please sign in to comment.