From edfe5d54641a35e8dc6f86442e642d64263e435c Mon Sep 17 00:00:00 2001 From: Dave Arnold Date: Mon, 20 Apr 2026 17:38:28 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20add=20ADR=200001=20=E2=80=94=20all=20ge?= =?UTF-8?q?nerated=20cluster=20repo=20files=20versioned=20in=20terraform-e?= =?UTF-8?q?ks-deployment?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../0001-generated-file-source-of-truth.md | 134 ++++++++++++++++++ 1 file changed, 134 insertions(+) create mode 100644 docs/adr/0001-generated-file-source-of-truth.md diff --git a/docs/adr/0001-generated-file-source-of-truth.md b/docs/adr/0001-generated-file-source-of-truth.md new file mode 100644 index 0000000..5e35209 --- /dev/null +++ b/docs/adr/0001-generated-file-source-of-truth.md @@ -0,0 +1,134 @@ +# ADR 0001: All Generated Cluster Repository Files Must Be Versioned in terraform-eks-deployment + +**Date:** 2026-04-20 +**Status:** Proposed +**Deciders:** arnol377, morga471 + +--- + +## Context + +The EKS Cluster Automation (ECA) system generates new EKS cluster repositories by running +`terraform apply` inside a CodeBuild project (`eks-terragrunt-repo-creator`). The build +checks out a pinned commit of `terraform-eks-deployment` (via `REPO_BRANCH` in `buildspec.yml`) +and applies it, which calls `CSVD/terraform-github-repo` to create the GitHub repo and +commit all generated files via `managed_extra_files`. + +The files written into a generated cluster repo fall into two categories: + +1. **Rendered config files** — `_envcommon/default-versions.hcl`, `_envcommon/common-variables.hcl`, + `account.hcl`, `region.hcl`, `vpc.hcl`, `cluster.hcl` — rendered from Go templates + (`*.tf.tpl`) committed inside `terraform-eks-deployment/templates/`. + +2. **Terragrunt module entrypoints** — `eks/terragrunt.hcl`, `eks-config/terragrunt.hcl`, + `eks-dns/terragrunt.hcl`, and all other `eks-*/terragrunt.hcl` files — one per + Terragrunt module in the cluster's run-all graph. + +Historically, the second category was provided by cloning `template-eks-cluster` as a +GitHub repo template. The template contained placeholder directory paths +(`environment/region/vpc/cluster/eks-*/`) that were supposed to be renamed to real computed +paths after clone. That renaming was never implemented, producing broken repos with literal +`environment/region/vpc/cluster` in all paths. + +PR #16 (`test_cluster` → `main`) correctly eliminates the GitHub template feature +(`template_repo = null`) but proposes reading the `eks-*/terragrunt.hcl` files live from +`template-eks-cluster:main` at Terraform plan time via `data.github_repository_file`. + +This ADR records the decision about where those files should live and why. + +--- + +## Decision + +We will commit all `eks-*/terragrunt.hcl` template files directly into +`terraform-eks-deployment/templates/eks-modules/` and write them into generated repos +via `managed_extra_files`, alongside the existing rendered config files. + +The `template-eks-cluster` GitHub repo will no longer be used as a source of file content +in the automation path. The GitHub template feature (`template_repo`) will remain `null`. + +--- + +## Alternatives Considered + +### Option A: Read eks-module files live from `template-eks-cluster` at plan time (PR #16 approach) + +`data.github_repository_file` datasources fetch each `eks-*/terragrunt.hcl` from +`template-eks-cluster:main` during `terraform plan`. They are passed into +`managed_extra_files` alongside the rendered config files. + +**Rejected because:** + +- **Internal consistency cannot be guaranteed.** The rendered config files + (`_envcommon/default-versions.hcl`, `_envcommon/common-variables.hcl`) are generated + from templates in `terraform-eks-deployment`. The eks-module files are fetched live from + a separate repo at a different, independently-advancing ref. A change to + `eks-karpenter/terragrunt.hcl` in `template-eks-cluster` that references a new variable + not yet present in `default-versions.hcl` will flow into new repos silently, producing + files that are internally inconsistent and will fail when terragrunt is run. + +- **Partial updates are possible.** PR #16's drift-detection update mode only re-commits + files whose content changed. A template update that touches `eks-karpenter/terragrunt.hcl` + but not `default-versions.hcl` could produce a cluster repo where those two files are + at different effective versions. + +- **Plan-time API coupling increases fragility.** Every `terraform plan` makes one GitHub + API call per eks-module file (currently 14 calls). If the GHE endpoint is slow or the + token lacks access, the plan fails regardless of whether the user intends to touch those + files. + +- **`REPO_BRANCH` pinning is undermined.** CodeBuild pins `terraform-eks-deployment` to a + tested commit via `REPO_BRANCH`. This guarantees a known, reproducible set of Terraform + logic and defaults. Pulling supporting files from a separately-versioned repo at runtime + breaks that reproducibility guarantee — the effective artifact being applied is no longer + fully described by a single commit. + +### Option B: Keep `template-eks-cluster` as a GitHub repo template (previous approach) + +Use the GitHub template feature to seed new repos with `eks-*/terragrunt.hcl` files and +then rename the placeholder paths via a post-apply script. + +**Rejected because:** + +- Placeholder paths (`environment/region/vpc/cluster/`) land in the generated repo and + cannot be easily renamed after the fact via standard Terraform resources. +- Requires an out-of-band post-apply step (script or `null_resource`) that runs outside + Terraform's state model. +- The template repo still diverges from `terraform-eks-deployment` over time (same + consistency problem as Option A). + +--- + +## Consequences + +**Positive:** + +- A single commit of `terraform-eks-deployment` fully describes all files that will be + written into a generated cluster repo. Pinning `REPO_BRANCH` in `buildspec.yml` is + sufficient to produce a fully reproducible, internally consistent artifact. +- When a new eks-module version or a new variable is added, a single PR to + `terraform-eks-deployment` updates both the `eks-*/terragrunt.hcl` template and the + corresponding `default-versions.hcl` template atomically. They cannot diverge. +- No live API calls at plan time for file content. Plan performance and reliability are + not affected by the availability of `template-eks-cluster`. +- The GitHub template feature (`template_repo`) is not used, removing a dependency on a + separately-maintained repo and on GitHub's template clone behavior. + +**Negative:** + +- `template-eks-cluster` and `terraform-eks-deployment/templates/eks-modules/` must be + kept manually in sync if humans use the template repo as a reference. Mitigation: add a + README to `template-eks-cluster` noting that it is no longer the automation source of + truth and pointing to `terraform-eks-deployment`. +- Adding a new eks-module requires a PR to `terraform-eks-deployment` rather than just + adding a directory to `template-eks-cluster`. This is the desired behavior — changes + go through review — but is a minor workflow difference. + +**Neutral:** + +- `template-eks-cluster` can be archived or retained as a human-readable reference. It + is not deleted because it may still be useful for onboarding documentation. +- The `data.github_repository_file` approach in PR #16 remains valid for a future + *update* workflow (deliberately syncing template changes into existing cluster repos), + as long as that workflow operates on the `templates/eks-modules/` copy in + `terraform-eks-deployment` rather than `template-eks-cluster:main`.