Skip to content

Update Workflow for existing clusters #16

Merged
merged 29 commits into from
Apr 21, 2026
Merged

Update Workflow for existing clusters #16

merged 29 commits into from
Apr 21, 2026

Conversation

morga471
Copy link
Collaborator

@morga471 morga471 commented Mar 19, 2026

Summary

Implements a repository_mode = "update" workflow so that an existing EKS cluster repo on GitHub Enterprise can have its managed HCL files refreshed via a Terraform apply, without creating a new repository.


What changed

Core workflow: repository_mode

Replaces the old create_repository = bool variable with a validated string enum:

Value Behavior
"create" Creates a new GHE repo and opens a PR on branch new/<name>
"update" Manages an existing repo and opens a PR on branch update/<name>

A terraform_data precondition guards against accidentally running create against a repo that already exists.

Local template management (no more template-eks-cluster dependency)

All terragrunt HCL templates are now managed directly inside templates/eks-modules/ in this repo, consistent with ADR #1 (#18). At plan/apply time, Terraform uses fileset() to discover every file in that directory and maps it to its target path in the generated repo:

templates/eks-modules/eks-karpenter.terragrunt.hcl
  → <environment>/<region>/<vpc_name>/<cluster_name>/eks-karpenter/terragrunt.hcl

Module enablement is controlled by template_enabled_modules (a map(bool)). Core modules (eks, eks-config, eks-karpenter, eks-istio, eks-dns) are always included. Optional modules default to their values in the variable — callers can override per-cluster:

template_enabled_modules = {
  eks-gatekeeper = false
  eks-grafana    = false
}

Idempotent file diffing (update mode only)

In update mode, Terraform reads the existing files on the source branch via data.github_repository_file and compares them against the desired content. Only files that are missing or have changed are written, avoiding spurious commits on every apply.

Dynamic path resolution

Fixed hardcoded "environment/..." path prefixes in rendered_files. All generated file paths now correctly use ${var.environment}/${var.region}/....

New _envcommon/prefixes.hcl generated file

Added templates/prefixes.hcl.tf.tpl — generates a prefixes.hcl file alongside common-variables.hcl and default-versions.hcl, containing the standard Terragrunt naming-prefix map used across all EKS cluster repos.

Per-cluster FinOps override

cluster_config gains optional finops_project_name, finops_project_number, finops_project_role fields. These take precedence over the global finops variable via coalesce() in defaults.tf.

CostAllocation tag plumbed through

var.cluster_config.organization is now written as CostAllocation in both config.json and cluster.hcl.tf.tpl.

Version bumps

Component Before After
EKS cluster version 1.31 1.34
terraform-aws-eks module 20.33.1 21.11.1
AWS provider 5.84.0 6.0
Karpenter 1.3.1 1.8.5
Kiali operator 2.2.0 2.21.0
Istio 1.25.0 1.28.3
Gatekeeper 3.2.1 / 0.1.53 4.4.0 / 0.1.60
Keycloak chart 24.4.11 7.0.1
Loki chart 6.27.0 6.49.0
Prometheus chart 27.5.1 28.6.0
Tempo chart 1.18.2 1.24.3

New version objects

Added first-class version blocks for cribl, otel (6 sub-fields), and postgresql.

Removed components

cert_manager, metrics_server, and k8s_dashboard removed from variables, namespaces, base_namespaces, and default_versions. Namespace names simplified: aoperatoroperator, atelemetrytelemetry.

clusters/ directory

Five working cluster configurations added under clusters/ demonstrating the update workflow against real cluster repos (adsd-tools-dev, csvd-dev-mcm, csvd-lab-dja, csvd-lab-mcm, csvd-mcm-common).

Module source

Updated from the old SSH URL (git::git@...) to an HTTPS URL pointing to CSVD/terraform-github-repo?ref=main.


Merge conflict resolution

This branch was rebased onto main (which had diverged on locals.tf, main.tf, and both template files). In all four conflicts the PR-16 version was kept — it is a strict superset of what was on main.


Testing

  • terraform validate passes against the module root.
  • Update workflow tested via clusters/csvd-dev-mcm/main.tf against the live csvd-dev-mcm repo in SCT-Engineering.

Fixes added during E2E testing (2026-04-21)

defaults.tf — optional finops fields safe when all values are empty

  • coalesce() errors when all arguments are null or "" — replaced with try(coalesce(...), "") for finops_project_name, finops_project_number, finops_project_role
  • Caught during first E2E run; pushed to both test_cluster and main

@morga471 morga471 self-assigned this Mar 19, 2026
@arnol377
Copy link
Collaborator

Architecture note — data.github_repository_file vs committed templates

The approach of reading eks-*/terragrunt.hcl files live from template-eks-cluster:main at plan time creates a consistency risk: the fetched eks-module files and the _envcommon/ files rendered from templates in this repo can silently diverge (different module versions, different variable sets) when either repo is updated independently.

The REPO_BRANCH pin in buildspec.yml is intended to make a single commit of terraform-eks-deployment the complete, reproducible source of truth for everything written into a generated cluster repo. Splitting file ownership across two repos at independent refs undermines that guarantee.

PR #18 adds an ADR that documents this decision and the full set of alternatives considered: #18

The actionable change needed in this PR is: instead of data.github_repository_file reading from template-eks-cluster, commit the eks-*/terragrunt.hcl files to terraform-eks-deployment/templates/eks-modules/ and wire them into managed_extra_files here alongside the rendered config files.

Dave Arnold added 3 commits April 21, 2026 14:07
# Conflicts:
#	locals.tf
#	main.tf
#	templates/common-variables.hcl.tf.tpl
#	templates/default-versions.hcl.tf.tpl
…es/eks-modules

Remove the data.github_repository_file / data.github_tree approach that read
cluster-level terragrunt files from the template-eks-cluster repo at runtime.
All HCL templates are now managed locally in templates/eks-modules/ inside this
module repo, consistent with the ADR merged in #18.

Changes:
- Remove effective_template_enabled_modules and template_cluster_sync_files locals
- Remove data.github_repository_file.template_cluster_files data source
- Remove template_repo_name, template_repo_ref, template_cluster_file_paths variables
- Move effective_template_enabled_modules into the eks_module_files locals block
  and apply enablement filter directly to the fileset loop
- Update desired_managed_files_by_path to use local.eks_module_files
- Update template_enabled_modules variable description to reflect local template usage
@arnol377
Copy link
Collaborator

lgtm

@morga471 morga471 requested a review from arnol377 April 21, 2026 19:39
@arnol377 arnol377 merged commit f0f7426 into main Apr 21, 2026
1 check passed
@arnol377 arnol377 deleted the test_cluster branch April 21, 2026 19:41
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
2 participants