Skip to content

Commit

Permalink
docs: add ECA demo script with talking points and Q&A prep
Browse files Browse the repository at this point in the history
  • Loading branch information
Your Name committed Apr 14, 2026
1 parent dff9bfa commit e6547ed
Showing 1 changed file with 273 additions and 0 deletions.
273 changes: 273 additions & 0 deletions docs/DEMO_SCRIPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
# EKS Cluster Automation (ECA) — Demo Script

**Audience**: Platform Engineering, DevOps leads, team managers
**Duration**: ~20 minutes
**Goal**: Show that an engineer can go from zero to a fully-configured, review-ready EKS cluster
GitHub repository in under 5 minutes, by filling out a form in the AWS console.

---

## 0. Pre-Demo Setup (before the audience arrives)

- [ ] Log into the AWS console in the account where the SC portfolio is shared (not csvd-dev)
- [ ] Have the Service Catalog console open at the "EKS Terragrunt Repository Creator Portfolio"
- [ ] Have a second browser tab ready with GitHub Enterprise (`github.e.it.census.gov/SCT-Engineering`)
- [ ] Have a third tab ready with the CodeBuild console in csvd-dev (`us-gov-west-1`)
- [ ] Have `awscreds` loaded in a terminal for CLI fallback
- [ ] Pre-stage demo values (fill in the table below with real values for the demo cluster):

| Parameter | Demo Value |
|-----------|------------|
| Repository Name (`project_name`) | `demo-eks-cluster-XX` |
| EKS Cluster Name | *(leave blank — defaults to repo name)* |
| Environment | `dev` |
| AWS Region | `us-gov-west-1` |
| Account Name | `csvd-dev-ew` |
| AWS Account ID | `229685449397` |
| Environment Abbr | `dev` |
| VPC Name | `csvd-dev-ew-vpc-01` |
| VPC Domain Name | `dev.inf.csp1.census.gov` |
| Owning Team | `tf-module-admins` |
| Creator Username | *(your GHE username for admin access)* |

---

## 1. Opening — The Problem (2 min)

> "Before this system existed, setting up the infrastructure side of a new EKS cluster
> meant manually creating a GitHub repository, copying files from a reference template,
> filling in account IDs, VPC names, region values, and cluster-specific configuration
> into seven different Terragrunt HCL files — by hand. That's error-prone, it takes
> significant time from a senior engineer, and it has to happen before any infrastructure
> work can even start."
**Talking points:**
- Every EKS cluster needs a dedicated GitHub repository with a precise directory structure
- The repo contains Terragrunt wrappers for ~15 Terraform modules (networking, compute, IAM, observability)
- Values like cluster name, VPC name, account ID, and environment must be consistent across all files
- A single typo at this stage cascades into failed `terragrunt apply` runs later
- Teams were blocked waiting for platform engineers to set this up

---

## 2. The Solution — Service Catalog Self-Service (1 min)

> "What we've built is a Service Catalog product. A team lead fills out one form — the same
> kind of form used to request a database or an EC2 instance — and the system takes care of
> everything: creates the repository, populates all the configuration files with the right
> values, and opens a pull request. The engineer reviews it, merges, and they're ready to start
> deploying their cluster."
**Talking points:**
- Self-service: no platform engineer required to set up a new cluster repo
- Runs from any AWS account in the org — teams use their own console
- Input validation on the form catches typos before anything is created
- Outputs a direct link to the repository and the open PR

---

## 3. Architecture Overview (3 min)

*Draw or point to the ASCII diagram on screen:*

```
Any AWS Account (team's account) csvd-dev (229685449397)
───────────────────────────────── ──────────────────────────────────────
Service Catalog console
→ fill out form
→ click "Launch"
→ CloudFormation stack ───────────→ Lambda (cross-account invocation)
├─ reads GitHub PAT from Secrets Manager
├─ starts CodeBuild with cluster params as env vars
│ ↓
│ CodeBuild (terraform-eks-deployment)
│ ├─ installs Terraform from S3
│ ├─ installs Census CA cert
│ ├─ clones terraform-eks-deployment from GHE
│ └─ terraform apply
│ └─ CSVD/terraform-github-repo module
│ ├─ creates repo from template-eks-cluster
│ ├─ renders 7 HCL files + config.json
│ └─ opens PR (repo-init → main)
└─ returns repo URL + PR URL
← CFN stack outputs (repo + PR URLs) ──
← SC product status: AVAILABLE
```

**Talking points:**
- The product is shared at the **OU level** — any account in the org can provision it
- All compute runs **centrally in csvd-dev** — no Lambda or CodeBuild needed in the team's account
- CloudFormation natively supports cross-account Lambda invocation via the `ServiceToken` ARN
- The launch role (`r-ent-servicecatalog-eksterragrunt-launch-role`) is deployed to every account
automatically via a **CloudFormation StackSet** — zero manual setup per account
- The Lambda is a thin orchestrator — it just starts CodeBuild and polls for completion
- Actual repo creation is pure Terraform (`terraform-eks-deployment` workspace) — auditable, testable, repeatable

---

## 4. Live Demo — Provisioning the Product (8 min)

### Step 1: Open Service Catalog
*Navigate to Service Catalog → Products list (or the portfolio directly)*

> "Here's the EKS Terragrunt Repository Creator product. This is available to any team
> with the inf-admin SSO role in any org account."
### Step 2: Launch the product
*Click "Launch product" → enter a name for the provisioned product*

> "I'll give this provisioned product an instance name — this is just the SC record name,
> not the repo name."
### Step 3: Fill in the parameters
*Walk through the form sections:*

**Cluster Configuration:**
> "Repository name becomes the cluster name by default. We use lowercase hyphens only
> because this name flows into Kubernetes resource names. Environment is dev — this
> controls which Terragrunt environment directory gets created."
**Account Configuration:**
> "These three fields — account name, account ID, and environment abbreviation — get
> stamped into multiple HCL files. In the old world, you'd copy-paste these by hand
> into each file. Here, you type them once."
**VPC Configuration:**
> "VPC Name and VPC Domain Name. Note: this is the VPC name string, not a VPC ID.
> The Terragrunt configuration uses names for readability and to avoid account-specific
> IDs hardcoded in source control."
**Contact & FinOps:**
> "Team information and cost allocation tags. These get embedded in the cluster
> configuration and inherited by all Terraform runs against this cluster."
**Creator Username (optional):**
> "If you provide your GitHub username here, you'll get direct admin access to the
> created repo in addition to the owning team."
### Step 4: Launch and watch
*Click "Launch" → switch to the CloudFormation tab*

> "CloudFormation creates a stack with a single Custom Resource. That Custom Resource
> invokes our Lambda function in csvd-dev, cross-account."
*Switch to the CodeBuild console tab in csvd-dev*

> "We can see the CodeBuild build starting. The Lambda passed all the cluster parameters
> as environment variables — TF_VAR_name, TF_VAR_region, TF_VAR_cluster_config as JSON,
> and so on. CodeBuild is now downloading Terraform from our internal S3 assets bucket,
> installing the Census CA cert so it can reach GitHub Enterprise, and running
> terraform init followed by terraform apply."
*[Wait ~2-3 minutes for CodeBuild to complete, or use a pre-completed build if time is short]*

### Step 5: Show the results
*Switch to GitHub Enterprise → SCT-Engineering org*

> "The repository has been created from the `template-eks-cluster` template.
> Let's look at what's in it."
**Walk through the generated files:**
| File | What to say |
|------|-------------|
| `config.json` | "Top-level configuration file — all cluster metadata in one place. This is what the Terragrunt modules read to self-configure." |
| `root.hcl` | "Root Terragrunt config — sets the remote state bucket prefix and environment." |
| `<env>/account.hcl` | "Account-level config — account name, ID, and profile. Shared by all clusters in this account." |
| `<env>/<region>/region.hcl` | "Region-level config — inherited by all VPCs and clusters in this region." |
| `<env>/<region>/<vpc>/vpc.hcl` | "VPC-level config — domain name and routing configuration for this cluster's network." |
| `<env>/<region>/<vpc>/<cluster>/cluster.hcl` | "The leaf node — cluster-specific values: name, node group sizes, mailing list, FinOps tags." |
| `README.md` | "Pre-populated with cluster name, environment, and links to relevant runbooks." |

> "Every value in every file was rendered from the form inputs. No copy-paste, no manual edits."
*Click to the "Pull requests" tab*

> "And there's the PR: `repo-init` into `main`. The engineer reviews this — checks the
> values look right, confirms the directory structure is correct — then merges. At that
> point their repository is production-ready and they can start running Terragrunt to
> deploy the actual AWS infrastructure."
### Step 6: Back to Service Catalog
*Switch back to the SC console*

> "The provisioned product shows AVAILABLE. The outputs are the repository URL and PR URL
> — so a team lead provisioning this from, say, a ci account or their sandbox account,
> gets these links back directly in the console without ever touching GitHub."
---

## 5. Infrastructure as Code Story (3 min)

> "The product itself is entirely defined as code — nothing was clicked into existence."
**Three repos, three layers:**

| Layer | Repo | What it deploys |
|-------|------|-----------------|
| Compute (prerequisite) | `lambda-template-repo-generator` | Lambda function, CodeBuild project, IAM roles, ECR image via packer pipeline |
| SC product + portfolio | `terraform-service-catalog-census` | SC portfolio, product, launch role StackSet, S3 template object |
| Runtime logic | `terraform-eks-deployment` | The Terraform that actually creates the GitHub repo + HCL files |

**Talking points:**
- `lambda-template-repo-generator/deploy/` is a standard Terraform root — `tf apply` deploys the Lambda
- `terraform-service-catalog-census` is a Terragrunt monorepo — the EKS product is one of ~6 SC products managed there
- The Lambda buildspec is stored in `terraform-eks-deployment/buildspec.yml` and inlined into the CodeBuild
project at `tf apply` time — so updating the build steps is a normal PR workflow
- The launch role (`eks-terragrunt-launch-role.yaml`) is a CloudFormation template deployed via
StackSet — so it exists in every shared account automatically when a new account joins the OU

---

## 6. Key Design Decisions (2 min — if asked)

**Why CodeBuild instead of Lambda for repo creation?**
> "The CSVD Terraform GitHub module handles all the complexity: branch protection rules,
> team access, file rendering, PR creation. Rewriting that in Python would be significant
> additional maintenance burden. CodeBuild lets us run vanilla Terraform — same code,
> same modules, same provider versions as any other Terraform workspace."
**Why is the Lambda centralized in csvd-dev instead of per-account?**
> "The Lambda makes no AWS API calls in the provisioner's account — it only talks to
> GitHub Enterprise and CodeBuild, both of which are in csvd-dev. CloudFormation's
> `ServiceToken` mechanism supports cross-account Lambda invocation natively. Deploying
> the Lambda to every account would require ECR replication, per-account CodeBuild,
> per-account Secrets Manager secrets — real complexity with no benefit."
**Why Service Catalog instead of a plain API or CLI tool?**
> "Service Catalog gives us SSO-integrated access control, the provisioned product
> lifecycle (create/update/terminate), and a UI that non-engineers can use. It also
> lets the platform team define exactly what inputs are allowed — the form parameters
> are validated by CloudFormation before anything runs."
---

## 7. Cleanup / Termination

> "If the provisioned product is terminated in Service Catalog, CloudFormation deletes
> the stack. The GitHub repository itself is not deleted — archive-on-destroy is disabled.
> This is intentional: you don't want infrastructure code deleted automatically."
---

## 8. Q&A Prep

| Likely question | Answer |
|----------------|--------|
| Can I update the repository after it's created? | Not through SC — that's intentional. After provisioning, the repo is owned by the team. SC is for initial creation only. |
| What happens if CodeBuild fails? | Lambda returns FAILED to CloudFormation, CFN rolls back the stack, SC shows the product as FAILED with the error message. |
| Can we add more HCL files to what gets generated? | Yes — edit `terraform-eks-deployment/main.tf` (the `rendered_files` locals block). It's all Terraform templatefile calls. |
| How are GitHub tokens managed? | Two tokens in Secrets Manager: an App installation token for the Lambda's Python API calls, and a PAT for the Terraform GitHub provider (passed to CodeBuild). |
| How does this scale to many accounts? | The StackSet auto-deploys the launch role to every account that joins the OU. No manual setup per account. |
| Can we version the product? | Yes — bump `product_version` in `configurations/products/eks-terragrunt-repo/EKS_REPO.yaml.tftpl` and add a new CFN template. Old provisioned products keep their version. |

---

## Appendix: Useful URLs for the Demo

| Resource | URL |
|----------|-----|
| SC console | AWS console → Service Catalog → Products |
| CodeBuild builds | AWS console → CodeBuild → Build history → `eks-terragrunt-repo-creator` |
| Lambda logs | CloudWatch → Log groups → `/aws/lambda/eks-terragrunt-repo-gen-template-automation` |
| GHE org | `https://github.e.it.census.gov/SCT-Engineering` |
| template-eks-cluster | `https://github.e.it.census.gov/SCT-Engineering/template-eks-cluster` |

0 comments on commit e6547ed

Please sign in to comment.