From b14a0849b324c20971bebaee153a208c87a6a2da Mon Sep 17 00:00:00 2001 From: Dave Arnold Date: Fri, 22 May 2026 14:01:10 -0400 Subject: [PATCH] adding supporting documentation for AWS Account Bootstrapping and how it could either support or be changed to work better with this project --- docs/account-bootstrap-analysis.md | 826 +++++++++++++++++++++++++++++ 1 file changed, 826 insertions(+) create mode 100644 docs/account-bootstrap-analysis.md diff --git a/docs/account-bootstrap-analysis.md b/docs/account-bootstrap-analysis.md new file mode 100644 index 0000000..e2bffb4 --- /dev/null +++ b/docs/account-bootstrap-analysis.md @@ -0,0 +1,826 @@ +# Account Repo Analysis & Bootstrap Automation Proposal + +**Date:** 2026-05-22 +**Status:** Proposed +**Author:** AI analysis of `account-repos` workspace +**Audience:** Platform Engineering / SCT-Engineering + +--- + +## 1. Executive Summary + +This document records a systematic analysis of the ~100 AWS account repositories +cloned under `~/git/account-repos` and maps the common structural elements into +a proposed series of **sc-lambda-ghactions workspaces** (Service Catalog products +backed by template repos) that can automate the full account bootstrap lifecycle. + +The analysis found that every account repo — regardless of account type, partition +(GovCloud vs commercial EW), or team ownership — follows a strictly ordered, +repeatable sequence of Terraform workspaces. The content of each workspace is +highly parameterized but structurally identical across accounts. This makes the +entire bootstrap sequence a strong candidate for sc-lambda-ghactions automation. + +A key finding is that the **git-secret / GPG credential system** is the primary +architectural blocker for headless automation of the `common/` layer (IAM foundation). +ADR-002's Vault AWS Secrets Engine, extended to cover Vault KV for provider +credentials, directly unblocks this. Section 6 covers this in detail. + +Where full automation is not possible, this document states so explicitly and +explains why. + +--- + +## 2. Account Repo Structure — Universal Elements + +Every account directory under `account-repos/` contains exactly the following +top-level items. No account was missing any of these. + +### 2.1 Top-Level Layout + +``` +{account-id}-{account-alias}/ +├── applications/ # app-workspace scaffold (most accounts) +│ └── structure/ # mirrors common/ infrastructure/ vpc/ as templates +├── common/ # IAM: policies, roles, groups, users, SAML, LDAP +├── credentials.d/ # per-region AWS credential .tf files +├── edl-automation/ # EDL-specific automation (EDL accounts only) +├── includes.d/ # shared variable definitions (tags) +├── infrastructure/ # TF state backend, S3 logs, CloudTrail, Config +├── init/ # git repo setup, git-secret, GPG key +│ ├── git-secret/ # team-member GPG public keys (.gpg.asc) +│ ├── git-setup/ # IaC to create/configure the GitHub repo +│ └── gpg-setup/ # account-specific GPG key generation +├── provider_configs.d/ # provider secrets: GitHub, LDAP, Infoblox, DNS +├── variables.d/ # variables.common.tf, variables.tfstate.tf, per-region .tfvars +├── vpc/ # VPC resources per region +├── INF.SETUP.md # step-by-step human bootstrap guide +├── README.md +├── TOP # high-level apply-phase sequence (non-apps repos) +├── outputs.common.tf +├── region.tf # locals { region = var.region } +└── tf-run.data # orchestration phases (TAG/COMMENT directives) +``` + +### 2.2 Within `common/` + +``` +common/ +├── INF.account-info.tf # module "account_settings" — alias, password policy +├── INF.general-policies.tf # managed + custom IAM policies +├── INF.saml.tf # IAM SAML provider +├── INF.ldap-ou-create.tf # base LDAP OU for the account +├── INF.role.inf-cloud-admin.tf +├── INF.group.inf-cloud-admin.tf +├── INF.role.inf-network-admin.tf +├── INF.role.inf-flowlogs.tf +├── INF.group.inf-ip-restriction.tf +├── INF.remote-roles.tf # additional SAML roles +├── INF.admin-user.{username}.tf # one file per admin user (variable count) +├── inf-cloud-admin.users.tf +├── INF.service.cloudforms.tf +├── sso/ # per-SSO-permission-set subdirectories +├── remote_state.backend.tf # symlink (→ .s3 or .none depending on state) +├── remote_state.common.tf +├── outputs.common.tf +├── region.tf +└── versions.tf +``` + +### 2.3 Within `infrastructure/` + +``` +infrastructure/ +├── INF.tfstate.tf # S3 bucket + DynamoDB for TF state +├── east/ # per-region workspace +│ ├── INF.cloudtrail.tf +│ ├── INF.config.tf +│ ├── INF.s3-access-logs.tf +│ ├── INF.s3-flow-logs.tf +│ ├── INF.object-logs.tf +│ ├── INF.dynamic-route53.tf +│ ├── INF.ses-domain.tf +│ ├── INF.preload-kms.tf +│ ├── INF.splunk-description.tf +│ ├── locals.tf +│ ├── region.tf +│ ├── remote_state.backend.tf +│ └── versions.tf +└── west/ # same structure as east/ +``` + +### 2.4 Provider Configurations (`provider_configs.d/`) + +Present in every account without exception: + +| File | Provider | +|---|---| +| `provider.github.tf` | GitHub Enterprise | +| `provider.github.variables.tf` | GitHub provider variables | +| `provider.github.auto.tfvars.secret` | GitHub PAT (git-secret encrypted) | +| `provider.ldap.tf` | LDAP (legacy) | +| `provider.ldap.variables.tf` | LDAP variables | +| `provider.ldap.auto.tfvars.secret` | LDAP bind password (git-secret encrypted) | +| `provider.ldap_new.tf` | LDAP (new provider) | +| `provider.ldap_new.variables.tf` | LDAP new variables | +| `provider.ldap_new.auto.tfvars.secret` | LDAP new bind password | +| `provider.dns.tf` | DNS (Infoblox/Route53) | +| `provider.infoblox.tf` | Infoblox (EW accounts only) | +| `provider.infoblox.variables.tf` | Infoblox variables (EW only) | +| `provider.infoblox.auto.tfvars.secret` | Infoblox creds (EW only) | +| `tf-run.data` | Phase orchestration for this layer | + +### 2.5 Variable Files (`variables.d/`) + +| File | Purpose | +|---|---| +| `variables.common.tf` | Variable declarations for all common inputs | +| `variables.tfstate.tf` | Variable declarations for state backend | +| `{region}.variables.common.auto.tfvars` | Per-region values (account_id, alias, region) | + +--- + +## 3. Bootstrap Phase Sequence + +All account repos share the same bootstrap execution order, controlled by `TOP` +and `tf-run.data`. The sequence is: + +``` +Phase 0: MANUAL — AWS account creation, initial bootstrap IAM user +Phase 1: init/ — GPG key, git-secret, GitHub repo +Phase 2: provider_configs.d/ — provider secret initialization +Phase 3: infrastructure/ (partial) — TF state backend (S3 + DynamoDB) +Phase 4: infrastructure/{region}/ — S3 access log buckets per region +Phase 5: common/ — IAM foundation (see ordered sub-steps below) +Phase 6: infrastructure/ (finalize) — flow log buckets, object logging, etc. +Phase 7: infrastructure/{region}/ — CloudTrail, Config, SES, Route53 per region +Phase 8: vpc/ — VPC per region +Phase 9: applications/structure/ — (if applicable) app workspace scaffold +``` + +### Phase 5 Sub-Steps (ordered dependencies) + +``` +5.1 general — managed_policies, custom_policies, custom_policy_documents +5.2 account_settings — account alias + IAM password policy +5.3 saml — IAM SAML provider +5.4 ldap_ou — base LDAP OU (prerequisite for all LDAP objects) +5.5 role inf-cloud-admin + group inf-cloud-admin (apply twice: create file, then LDAP object) +5.6 role inf-network-admin + group inf-network-admin +5.7 role inf-flowlogs +5.8 group inf-ip-restriction +5.9 splunk_user +5.10 service_cloudforms +5.11 admin user accounts (one tf file per user; parallel-safe within the group) +5.12 other SAML roles (remote roles — apply twice each) +``` + +--- + +## 4. Account Type Variations + +### 4.1 Partition + +The authoritative discriminator is the `aws_environment` variable in each account's `variables.d/*.variables.common.auto.tfvars`: + +| `aws_environment` value | Meaning | `credentials.d/` contents | Primary region | +|---|---|---|---| +| `"gov"` | AWS GovCloud (US) | `us-gov-east-1.credentials.tf`, `us-gov-west-1.credentials.tf` (2 files) | `us-gov-east-1` | +| `"ew"` | East-West commercial network zone | 17 commercial-region files (one account, `ent-ew-sectools-prod`, has 30 as newer regions were added) | `us-east-1` | + +**The `-gov` vs `-ew` name suffix does not reliably map to partition alone.** Examples: +- `ent-gov-operations-prod` → `aws_environment = "gov"` (GovCloud despite `-prod` suffix) +- `csvd-dev-ew` → has commercial-region credentials but is the GovCloud-linked commercial account for `csvd-dev-gov`, not a standalone commercial workload account +- `do2-prod` (no `-ew` suffix) → `aws_environment = "ew"` + +The `-ew` suffix, when present alongside a corresponding `-gov` account, designates the **GovCloud linked commercial account** — the pairing required by AWS for every GovCloud account. These accounts carry the standard 17-region commercial credential set but serve a different operational role than standalone commercial workload accounts. + +**Consequence for bootstrap automation:** `aws_environment` is the correct input field to use, not a derived partition value from the alias name. It must be supplied explicitly in the SC form. + +**Infoblox provider:** present in `provider_configs.d/` for `aws_environment = "ew"` accounts only. + +### 4.2 Program/Team + +| Pattern | Directories added | Notes | +|---|---|---| +| `edl-*` | `edl-automation/` | EDL-specific automation harness | +| `ent-gov-network-*` | `vpc-shared/` instead of `applications/` | Network accounts share VPC | +| `_apps-{stack}` | Separate repo per application stack; `SUBMODULE` file replaces `TOP` | Each stack is its own GitHub repo | + +### 4.3 `_apps-*` Repos + +Accounts with application stacks have companion repos following the naming +convention `{account-id}-{alias}_apps-{stack-name}`. These share the same +`common/`, `credentials.d/`, `infrastructure/`, `provider_configs.d/` scaffolding +as the base account repo but contain a `SUBMODULE` orchestration file rather than +`TOP`. They are registered as GitHub submodules in the base account repo. + +--- + +## 5. Key Inputs Required Per New Account + +These are the minimum parameterized values needed to generate a complete account +repo from scratch: + +| Input | Examples | Notes | +|---|---|---| +| `account_id` | `001476713248` | 12-digit AWS account number | +| `account_alias` | `edl-core-dev-gov` | Used in all resource naming | +| `aws_environment` | `gov` or `ew` | Sourced directly from the account's tfvars; controls region set + Infoblox presence. Do not derive from the alias name. | +| `primary_region` | `us-gov-east-1` or `us-east-1` | Drives first-region workspace | +| `secondary_region` | `us-gov-west-1` or `us-west-2` | Drives second-region workspace | +| `program` | `edl`, `ent`, `ma`, `lab`, etc. | Controls edl-automation inclusion | +| `environment` | `dev`, `nonprod`, `prod`, `common` | Tags and policy scoping | +| `admin_users` | `[badra001, dwara001, ...]` | Generates `INF.admin-user.*.tf` files | +| `team_gpg_keys` | Map of username → GPG public key | Populates `init/git-secret/` | +| `github_org` | `SCT-Engineering` or specific org | For `init/git-setup/` | +| `github_repo_name` | `{account_id}-{alias}` | Usually derived from above | +| `tfstate_bucket` | `inf-tfstate-{account_id}` | S3 bucket for remote state | +| `app_stacks` | `[]` or `[adsd-eks, tco-imds]` | Whether to create `_apps-*` repos | +| `include_edl_automation` | `true/false` | EDL accounts only | +| `include_vpc_shared` | `true/false` | Network accounts only | + +--- + +## 6. GPG Keys and git-secret — What They Actually Protect + +Understanding the GPG/git-secret system precisely is critical to assessing what +can be automated and what ADR-002 (Vault) can eliminate. + +### 6.1 Two Distinct GPG Key Systems + +There are **two separate GPG key concepts** in every account repo, serving +different purposes: + +#### Account-specific GPG keypair (`init/gpg-setup/`) + +`tf apply` in `init/gpg-setup/` generates a unique GPG keypair for the account +(e.g. `tf-001476713248-edl-core-dev-gov`). Its purpose, from `INF.gpg-setup.md`: + +> "This key is used for encrypting specific resource values, such as **IAM +> passwords** or **IAM access keys**." + +It is **not** used to protect provider credentials. It encrypts the IAM console +passwords and AWS access keys that Terraform generates for admin users (module +`admin_{username}`) so those sensitive values can be committed to the repo and +distributed out-of-band without appearing in plaintext. The key artifacts are: + +| File | Contents | How stored | +|---|---|---| +| `tf-{account}.gpg.b64` | Public key (base64) | Plaintext in git; symlinked at `TOP/init/tf-gpg-key.b64` | +| `tf-{account}.gpg.asc` | Public key (ascii-armored) | Plaintext in git | +| `tf-{account}.gpg.secret-key.secret` | **Private key** | Encrypted by git-secret | + +The private key is itself protected by git-secret — you need a team member's +personal GPG key to retrieve it. + +#### Team member GPG public keys (`init/git-secret/*.gpg.asc`) + +One file per engineer, sourced from `terraform/support/keys/gpg-public-keys`. +These are the **recipients** for git-secret's multi-key encryption. Anyone whose +key is in this directory can run `git secret reveal` to decrypt the protected +files in the repo. Adding or removing an engineer requires: importing their key, +running `git-secret tell $EMAIL`, running `git-secret hide` (re-encrypts all +files for all current recipients), and committing the result. + +### 6.2 What git-secret Actually Encrypts + +The `.secret` extension marks a git-secret encrypted file. The plaintext +counterpart (without `.secret`) is gitignored. Encrypted files found across all +account repos: + +| Encrypted file | Plaintext contains | Used by | +|---|---|---| +| `provider_configs.d/provider.github.auto.tfvars.secret` | `github_token`, `github_org`, `github_url` | All TF workspaces using GitHub provider | +| `provider_configs.d/provider.ldap.auto.tfvars.secret` | `ldap_user`, `ldap_password` | All TF workspaces creating LDAP objects | +| `provider_configs.d/provider.ldap_new.auto.tfvars.secret` | `ldap_user`, `ldap_password` (new provider) | Same | +| `provider_configs.d/provider.infoblox.auto.tfvars.secret` | Infoblox API credentials | EW accounts only | +| `init/gpg-setup/tf-{account}.gpg.secret-key.secret` | Account GPG private key | Local operators who need to decrypt IAM passwords | +| `vpc/{region}/.../access_key.yml.secret` | IAM access keys for service accounts | Where AWS access keys are stored in VPC workloads | + +### 6.3 Why This Blocks CodeBuild Automation Today + +`git secret reveal` requires a GPG private key present in the local keychain. +CodeBuild has no such key — it was never designed to be a git-secret recipient. +As a result: + +- **Any TF workspace that sources the GitHub or LDAP provider** (i.e., `common/`, + any SSO workspace, any workspace in `provider_configs.d/`) **cannot be run by + the executor in its current form.** The `.auto.tfvars.secret` files would not + be decrypted, the provider would have empty credentials, and the apply would fail. + +- The executor **can** run workspaces that use only the AWS provider (e.g., + `infrastructure/`, `vpc/`) because those rely on STS credentials injected via + environment variables, not on git-secret-managed files. + +This is the core automation gap. It is not just a manual step — it is an +architectural incompatibility between git-secret and headless automation. + +### 6.4 What Vault Can Replace + +ADR-002 is framed around AWS credential issuance (replacing static +`sts:AssumeRole`), but the Vault KV Secrets Engine can trivially extend to +replace git-secret for all provider credentials as well. The mapping is direct: + +| Today (git-secret) | With Vault KV | +|---|---| +| `provider.github.auto.tfvars.secret` → `git secret reveal` | `vault kv get secret/accounts/{alias}/github` → write `.auto.tfvars` at build time | +| `provider.ldap.auto.tfvars.secret` → `git secret reveal` | `vault kv get secret/accounts/{alias}/ldap` → write `.auto.tfvars` at build time | +| `provider.infoblox.auto.tfvars.secret` → `git secret reveal` | `vault kv get secret/accounts/{alias}/infoblox` → write `.auto.tfvars` at build time | +| Account GPG private key → `git secret reveal` | `vault kv get secret/accounts/{alias}/gpg-private-key` → decrypt IAM passwords at build time | + +The executor buildspec would add a `vault kv get` call per needed provider before +running `tf-init`/`tf-run`, injecting the plaintext credentials as temporary +files that are never committed. This replaces the entire `git secret reveal` +ceremony and eliminates the need for any team member to maintain GPG keys in a +git repo. + +### 6.5 What Vault Cannot Eliminate + +Even with Vault managing all secrets, two manual steps survive: + +1. **Account-specific GPG keypair generation (M4):** The `init/gpg-setup/` + module still generates a keypair used to encrypt IAM passwords that Terraform + outputs. If the Terraform `admin-user` module is redesigned to deliver + passwords via Vault KV (i.e., `vault kv put secret/accounts/{alias}/users/ + {username}/password $(tf output password)`) rather than GPG-encrypted files + in the repo, this step becomes unnecessary. This is an account-module change, + not a sc-lambda-ghactions change. + +2. **Bootstrapping Vault itself with the first account credential:** The very + first time a new account is bootstrapped, Vault does not yet have that + account's LDAP password or GitHub PAT. An operator must do a one-time + `vault kv put` for each credential. This is a single 3-command operation per + credential per account — far simpler than the full git-secret ceremony — and + can be performed by a central platform team without any access to the target + account's GPG keychain. + +### 6.6 Vault Scope Expansion Summary + +If ADR-002 is implemented and extended to cover provider credentials via Vault KV: + +| Manual step | Current status | With Vault | +|---|---|---| +| M4 — GPG keypair generation | Required per account | Eliminated if admin-user module writes passwords to Vault KV | +| M5 — Team member GPG key collection | Required per account per new team member | Eliminated — no git-secret recipients needed | +| M6 — `*.auto.tfvars.secret` encryption | Required per credential per account | Replaced by one `vault kv put` per credential (one-time, central team) | +| M10 — LDAP objects in `common/` | Currently blocked for CodeBuild | Unblocked — executor reads LDAP credentials from Vault at build time | + +The practical effect: implementing Vault KV for provider credentials **unlocks +full automation of `common/`** — the largest and most complex bootstrap workspace +— which is currently the hardest manual phase. + +--- + +## 7. Proposed sc-lambda-ghactions Workspace Series + +The following describes each proposed Service Catalog product / template repo +needed to automate the bootstrap sequence. They map directly onto the phase +sequence in section 3. + +Each workspace corresponds to one sc-lambda-ghactions **Proposer invocation** +(one PR, one executor run). They are ordered by dependency. + +--- + +### Workspace 0: `bootstrap-account-repo` + +**Template repo:** `template-bootstrap-account-repo` +**Layer:** `init` (special — not a standard TF layer; this creates the repo itself) +**Purpose:** Create the GitHub repo, set branch protections, configure teams, +write the top-level scaffold files (`TOP`, `tf-run.data`, `region.tf`, +`outputs.common.tf`, `README.md`, `INF.SETUP.md`). + +**Inputs:** +- `account_id`, `account_alias`, `aws_environment`, `program`, `environment` +- `primary_region`, `secondary_region` +- `github_org`, `github_teams` (list with permissions) +- `admin_users` (for `INF.SETUP.md` generation) + +**Rendered outputs:** +- `init/git-setup/INF.repo-setup.tf` — GitHub repo resource + team membership +- `TOP` — apply phase sequence file +- `tf-run.data` — orchestration file +- `README.md`, `INF.SETUP.md` — human documentation + +**What the executor does:** `tf apply` in `init/git-setup/` creates the GitHub +repo via the GitHub Terraform provider. + +> ⚠️ **Cannot be automated:** GPG key generation for `init/gpg-setup/` requires +> a human with GnuPG to generate the account-specific keypair, encrypt it, and +> commit the `.gpg.asc` and `.gpg.b64` artifacts. The private key must be +> distributed out-of-band to account operators. This step remains manual. + +> ⚠️ **Cannot be automated:** Each team member's GPG public key (`init/git-secret/ +> {username}.gpg.asc`) must be provided by the individual. The proposer can +> render the `git-secret` setup script, but the keys themselves must come from +> a known-good source (e.g., a team keyring registry). If such a registry exists +> in Secrets Manager or SSM, this can be automated; otherwise it remains manual. + +--- + +### Workspace 1: `bootstrap-provider-configs` + +**Template repo:** `template-provider-configs` +**Layer:** `provider_configs.d` +**Region dir:** `global` (no region scoping for this layer) +**Purpose:** Render all provider configuration `.tf` files and stub out the +encrypted secret placeholders. Sets up the GitHub, LDAP, LDAP-new, DNS, and +(if `aws_environment = "ew"`) Infoblox providers. + +**Inputs:** `account_id`, `account_alias`, `aws_environment` + +**Rendered outputs:** +- `provider_configs.d/provider.github.tf` +- `provider_configs.d/provider.ldap.tf`, `provider.ldap_new.tf` +- `provider_configs.d/provider.dns.tf` +- `provider_configs.d/provider.infoblox.tf` (EW only) +- `provider_configs.d/tf-run.data` +- All `variables.tf` counterparts + +> ⚠️ **Cannot be automated without Vault KV (see section 6):** The `*.auto.tfvars.secret` +> files (GitHub PAT, LDAP bind password, Infoblox credentials) are git-secret encrypted +> files that CodeBuild cannot decrypt. With ADR-002 extended to Vault KV, the executor +> would instead call `vault kv get secret/accounts/{alias}/{provider}` at build time and +> write the credentials as temporary files before `tf-init` — replacing git-secret entirely. +> Until that is implemented, an operator must manually `vault kv put` (or `git-secret hide`) +> the credentials once per account. This is also what gates full automation of `common/` +> (Workspace 6), which uses both the LDAP and GitHub providers. + +--- + +### Workspace 2: `bootstrap-credentials` + +**Template repo:** `template-credentials` +**Layer:** `credentials.d` +**Region dir:** `global` +**Purpose:** Generate the per-region AWS credential provider `.tf` files. + +**Inputs:** `account_id`, `account_alias`, `aws_environment` (`gov` or `ew`) +- `gov`: generates `us-gov-east-1.credentials.tf`, `us-gov-west-1.credentials.tf` +- `ew`: generates all 17 (or 30 for newer builds) commercial-region credential files + +**Rendered outputs:** `credentials.d/{region}.credentials.tf` for each region + +**What the executor does:** No TF apply needed; this is a file generation step. +The executor can be set to `DRY_RUN=true` for this workspace — the files just +need to be committed. + +--- + +### Workspace 3: `bootstrap-variables` + +**Template repo:** `template-variables` +**Layer:** `variables.d` +**Region dir:** `global` +**Purpose:** Generate `variables.common.tf`, `variables.tfstate.tf`, and per-region +`{region}.variables.common.auto.tfvars` files. + +**Inputs:** `account_id`, `account_alias`, `aws_environment`, `primary_region`, +`secondary_region`, `environment`, `program`, `tfstate_bucket` + +**Rendered outputs:** +- `variables.d/variables.common.tf` +- `variables.d/variables.tfstate.tf` +- `variables.d/{primary_region}.variables.common.auto.tfvars` +- `variables.d/{secondary_region}.variables.common.auto.tfvars` +- `includes.d/variables.account_tags.tf` +- `includes.d/variables.application_tags.tf` +- `includes.d/variables.infrastructure_tags.tf` + +**What the executor does:** `DRY_RUN=true` — file generation only. + +--- + +### Workspace 4: `bootstrap-infrastructure-tfstate` + +**Template repo:** `template-infrastructure-tfstate` +**Layer:** `infrastructure` +**Region dir:** `global` +**Purpose:** Bootstrap the Terraform state backend: S3 bucket + DynamoDB table. +This is the first workspace that touches real AWS infrastructure. + +**Prerequisite:** AWS bootstrap IAM user created manually (see section 7, +manual step M1). This executor run assumes the bootstrap user's credentials. + +**Inputs:** `account_id`, `account_alias`, `aws_environment`, `primary_region`, +`tfstate_bucket` + +**Rendered outputs:** +- `infrastructure/INF.tfstate.tf` +- `infrastructure/remote_state.backend.tf` (→ `.none` initially; executor re-links to `.s3`) +- `infrastructure/tf-run.data` +- `infrastructure/region.tf`, `versions.tf` + +**What the executor does:** +1. `tf apply -target=module.tfstate` — creates S3 + DynamoDB +2. Re-links `remote_state.backend.tf` → `.s3` (commits back to `main`) + +> ⚠️ **Cannot be automated:** Initial application of this workspace requires the +> `bootstrap` IAM user's credentials, which exist only in AWS Console and must +> be manually provided to CodeBuild (e.g., via SSM SecureString or injected as +> build-time overrides). One approach: after the account is created, an operator +> stores the bootstrap credentials in Secrets Manager under a known path, the +> executor reads them, applies, then the `common/` phase rotates to real users. +> This is architecturally possible but requires an agreed credential handoff +> convention not yet established. + +--- + +### Workspace 5: `bootstrap-infrastructure-regional-logs` + +**Template repo:** `template-infrastructure-regional-logs` +**Layer:** `infrastructure` +**Region dir:** `{primary_region}` then `{secondary_region}` (two separate runs) +**Purpose:** Create the `inf-logs-{account}-{region}` S3 access log bucket in +each region. Required before any ALB, S3 bucket, or object-log resources can be +configured. + +**Inputs:** `account_id`, `account_alias`, `aws_environment`, target region + +**Rendered outputs:** +- `infrastructure/{region}/INF.s3-access-logs.tf` +- `infrastructure/{region}/region.tf`, `remote_state.backend.tf`, `versions.tf` + +**What the executor does:** `tf apply -target=module.logs` per region. + +--- + +### Workspace 6: `bootstrap-common` + +**Template repo:** `template-common` +**Layer:** `common` +**Region dir:** `global` (common layer has no per-region split) +**Purpose:** Render and apply all IAM foundation resources in dependency order. + +This is the largest and most complex bootstrap workspace. Because of the +ordered dependency chain within `common/` (see section 3, Phase 5 sub-steps), +the executor must respect `TF_RUN_START_TAG` to resume from a given step. + +**Inputs:** +- `account_id`, `account_alias`, `aws_environment`, `environment`, `program` +- `admin_users` list (one `INF.admin-user.{username}.tf` per entry) +- `saml_provider_metadata` (SAML XML metadata from identity provider) +- `ldap_base_dn`, `ldap_account_ou` + +**Rendered outputs:** +- `common/INF.account-info.tf` +- `common/INF.general-policies.tf` +- `common/INF.saml.tf` +- `common/INF.ldap-ou-create.tf` +- `common/INF.role.inf-cloud-admin.tf` +- `common/INF.group.inf-cloud-admin.tf` +- `common/INF.role.inf-network-admin.tf` +- `common/INF.role.inf-flowlogs.tf` +- `common/INF.group.inf-ip-restriction.tf` +- `common/INF.service.cloudforms.tf` +- `common/INF.admin-user.{username}.tf` for each user in `admin_users` +- `common/inf-cloud-admin.users.tf` +- `common/remote_state.backend.tf`, `remote_state.common.tf` +- `common/outputs.common.tf`, `region.tf`, `versions.tf` +- `common/tf-run.data` (ordered TAG sequence matching section 3 Phase 5) + +**What the executor does:** +Runs `tf-run apply` which walks the `TAG` sequence in `common/tf-run.data`. +The `TF_RUN_START_TAG` env var allows resuming after a partial failure. + +> ⚠️ **Partially automatable — SAML metadata:** The SAML provider metadata XML +> must be obtained from the identity provider (e.g., Okta or ADFS) and passed +> as an input. If the IdP is Okta and an API exists, this can be automated. If +> the metadata is managed manually, it must be provided by an operator. + +> ⚠️ **Partially automatable — LDAP:** The `INF.ldap-ou-create.tf` module +> requires LDAP bind credentials in `provider_configs.d/` to be decryptable at +> runtime. Until ADR-002 (Vault AWS Secrets Engine) is implemented, these +> credentials must already be git-secret encrypted and present in the repo from +> Workspace 1. If that was done, LDAP steps are fully automated. If not, they +> require manual intervention. + +> ⚠️ **Two-pass apply for SAML roles:** Each `INF.role.*.tf` that creates LDAP +> objects requires two sequential `tf apply` calls (first creates a local file, +> second creates the LDAP object). The executor's `tf-run.data` TAG sequence +> handles this natively — no special tooling needed — but the operator must +> ensure the `common/tf-run.data` TAG ordering encodes the two-pass pattern. + +--- + +### Workspace 7: `bootstrap-infrastructure-finalize` + +**Template repo:** `template-infrastructure-finalize` +**Layer:** `infrastructure` +**Region dir:** `{primary_region}` and `{secondary_region}` (two runs) +**Purpose:** Apply the remaining infrastructure resources after `common/` is +complete (which provides the SAML roles required for flow-log and object-log +bucket policies). + +**Inputs:** `account_id`, `account_alias`, `aws_environment`, region + +**Rendered outputs per region:** +- `infrastructure/{region}/INF.s3-flow-logs.tf` +- `infrastructure/{region}/INF.object-logs.tf` +- `infrastructure/{region}/INF.cloudtrail.tf` +- `infrastructure/{region}/INF.config.tf` +- `infrastructure/{region}/INF.dynamic-route53.tf` +- `infrastructure/{region}/INF.ses-domain.tf` +- `infrastructure/{region}/INF.preload-kms.tf` +- `infrastructure/{region}/INF.splunk-description.tf` +- `infrastructure/{region}/locals.tf` + +**What the executor does:** `tf-run apply` walking the TAG sequence in the +regional `tf-run.data`. + +--- + +### Workspace 8: `bootstrap-vpc` + +**Template repo:** `template-vpc` +**Layer:** `vpc` +**Region dir:** `{primary_region}` and `{secondary_region}` (two runs) +**Purpose:** Create the VPC and associated networking resources in each region. + +**Inputs:** +- `account_id`, `account_alias`, `aws_environment`, region +- `vpc_cidr`, `subnet_cidrs` (map of AZ → CIDR) +- `vpc_name` (usually derived from account alias) +- Network account ID for VPC sharing (if applicable) + +**Rendered outputs:** +- `vpc/{region}/INF.vpc.tf` +- `vpc/{region}/INF.subnets.tf` +- `vpc/{region}/INF.tgw-attachment.tf` (if transit gateway) +- `vpc/{region}/region.tf`, `remote_state.backend.tf`, `versions.tf` + +**What the executor does:** `tf-run apply` applying VPC resources. + +> ⚠️ **Cannot be fully automated:** VPC CIDR allocation must be coordinated with +> the network team's IPAM system. The CIDRs cannot be derived automatically +> without an IPAM API integration. An operator must supply them via SC form +> inputs or the allocation must be read from an external registry (e.g., Infoblox +> or an internal IPAM). + +--- + +### Workspace 9 (optional): `bootstrap-applications-structure` + +**Template repo:** `template-applications-structure` +**Layer:** `applications` +**Region dir:** `structure` +**Purpose:** Scaffold the `applications/structure/` directories that mirror +`common/`, `infrastructure/`, and `vpc/` as templates for app teams. Only needed +for accounts that will host application stacks. + +**Inputs:** `account_id`, `account_alias`, `aws_environment`, `primary_region`, +`secondary_region`, `app_stacks` (list of stack names for `_apps-*` repos) + +**Rendered outputs:** +- `applications/structure/common/`, `infrastructure/`, `vpc/` scaffold files +- Symlinks matching the base account repo pattern + +--- + +### Workspace 10 (optional): `bootstrap-apps-repo` + +**Template repo:** `template-apps-repo` +**Layer:** `init` (creates a new GitHub repo) +**Purpose:** Create and scaffold a `{account-id}-{alias}_apps-{stack-name}` repo +for each application stack, registering it as a submodule of the base account repo. + +Repeat once per stack name in `app_stacks`. + +--- + +## 8. Manual Steps That Cannot Be Automated + +The following steps are explicitly outside the scope of sc-lambda-ghactions +automation in its current form. Each has the reason stated. + +| # | Step | Why It Cannot Be Automated | Potential Future Path | +|---|---|---|---| +| M1 | AWS account creation | Account Vending Machine / AWS Organizations automation is out of scope for this system | Integrate with AWS Control Tower or an internal AVM product | +| M2 | `bootstrap` IAM user creation | Requires AWS Console + AdminAccess before any IaC exists | Control Tower / account vending pre-creates a bootstrap role | +| M3 | Bootstrap IAM credentials handoff | Access key + secret for the bootstrap user must be securely handed to the first executor run | Store in Secrets Manager during AVM; executor reads from known path | +| M4 | GPG keypair generation (`init/gpg-setup/`) | Generates keypair used to encrypt Terraform-created IAM passwords in the repo | Eliminated if admin-user module is updated to write passwords to Vault KV instead of GPG-encrypting them in the repo (see section 6.5) | +| M5 | Team member GPG public key collection | `git secret tell` requires each engineer's public key as a git-secret recipient | **Fully eliminated by ADR-002 + Vault KV** — no git-secret recipients needed when secrets live in Vault (see section 6.4) | +| M6 | `*.auto.tfvars.secret` encryption | git-secret requires the account GPG key on the operator's keychain; CodeBuild cannot decrypt these files | **Substantially eliminated by ADR-002 extended to Vault KV** — replaced by one `vault kv put` per credential; this also unblocks `common/` automation (see section 6.3–6.4) | +| M7 | SAML provider metadata XML | Must be retrieved from the IdP (Okta, ADFS, etc.) | Automate if IdP has an API; otherwise operator pastes metadata into SC form | +| M8 | VPC CIDR allocation | CIDRs must come from an IPAM system | Automate via Infoblox API or internal IPAM product integration | +| M9 | `bootstrap` user rotation | After admin users are created, the bootstrap user's access key must be disabled and the TF import performed | Low complexity; could be a separate SC product (`import-bootstrap-user`) | +| M10 | Two-pass SAML role applies | LDAP objects require `tf apply` twice per role | Already handled by `tf-run.data` TAG sequence; not a manual step if executor is running cleanly | +| M11 | Initial `git checkout -b initial-setup` push | Per `common/INF.SETUP.md` — the first clean `git push` after `common/` complete | Could be part of executor post-apply commit; low risk to automate | + +--- + +## 9. Mapping to sc-lambda-ghactions Concepts + +### 8.1 `.sc-automation.yml` per workspace + +Each workspace PR committed by the Proposer includes a `.sc-automation.yml` +written to the account repo root. Because multiple workspaces touch the same +repo, the convention must be to write workspace-specific YAML -- or scope the +file to the workspace layer using the `.sc-automation.yml` path scoping +already built into the webhook handler. + +Proposed convention: +```yaml +# .sc-automation.yml written by template-bootstrap-common proposer +account_repo: 001476713248-edl-core-dev-gov +layer: common +region_dir: global +target_account_id: "001476713248" +dry_run: false +tf_run_start_tag: "" # set to a TAG label to resume from a partial failure +``` + +### 8.2 Cross-account IAM role + +Every PR-merge-triggered executor run for a target account requires +`sc-automation-codebuild-role` to exist in that account. During bootstrap, +this role does not yet exist at the time Workspace 4 runs. Two options: + +**Option A (recommended):** The AWS account vending process (Control Tower or +AVM) pre-creates `sc-automation-codebuild-role` as part of the account baseline +before any bootstrap workspace runs. This is the cleanest design. + +**Option B:** Workspace 4 (`bootstrap-infrastructure-tfstate`) runs in the target +account using the bootstrap user's static credentials injected via Secrets Manager +rather than `sts:AssumeRole`. After `common/` creates the real admin users and +the `sc-automation-codebuild-role` can itself be applied as a module in `common/`, +all subsequent workspaces use the standard assume-role path. + +### 8.3 Template repo versioning + +Each template repo (section 6) should be tagged (`v1.0.0`, `v1.1.0`, etc.) and +the CFN product template for that workspace should pin `template_repo_ref`. This +ensures that re-running a bootstrap workspace on an existing account (e.g., to +add a new admin user) uses the exact same templates that created the account. + +### 8.4 Ordered product invocation via the SC console + +The 10 workspace products should be presented in an SC portfolio named +**Account Bootstrap** with a display order that mirrors the dependency sequence. +There is no current mechanism in sc-lambda-ghactions to enforce ordering between +products — the human operator is responsible for launching them in sequence. + +A future enhancement could add a dependency state machine (Step Functions or +DynamoDB tracking) to block Workspace N from launching until Workspace N-1 +has a successful executor commit status. This is out of scope for the initial +implementation. + +--- + +## 10. Template Repo Summary + +| Workspace | Template Repo | TF Layer | Primary SC Input Fields | +|---|---|---|---| +| 0 | `template-bootstrap-account-repo` | `init/git-setup` | account_id, alias, aws_environment, github_teams, admin_users | +| 1 | `template-provider-configs` | `provider_configs.d` | account_id, alias, aws_environment | +| 2 | `template-credentials` | `credentials.d` | account_id, alias, aws_environment | +| 3 | `template-variables` | `variables.d` | account_id, alias, aws_environment, regions, environment, program | +| 4 | `template-infrastructure-tfstate` | `infrastructure` | account_id, alias, aws_environment, primary_region, tfstate_bucket | +| 5 | `template-infrastructure-regional-logs` | `infrastructure/{region}` | account_id, alias, region | +| 6 | `template-common` | `common` | account_id, alias, aws_environment, admin_users, saml_metadata, ldap config | +| 7 | `template-infrastructure-finalize` | `infrastructure/{region}` | account_id, alias, region | +| 8 | `template-vpc` | `vpc/{region}` | account_id, alias, region, vpc_cidr, subnet_cidrs | +| 9 | `template-applications-structure` | `applications/structure` | account_id, alias, aws_environment, regions, app_stacks | +| 10 | `template-apps-repo` | `init/git-setup` (new repo) | account_id, alias, stack_name | + +--- + +## 11. Phased Implementation Recommendation + +Given the complexity of the full sequence, this implementation plan stages the +work into three phases aligned with the manual blockers: + +### Phase 1 — Structural scaffolding (no executing Terraform) +Workspaces 0, 1, 2, 3 — these produce only committed files and repo configuration. +They do not require the bootstrap IAM user or any running AWS infrastructure. +High confidence of automation; implement first. + +### Phase 2 — Infrastructure foundation (requires bootstrap credential handoff) +Workspaces 4 and 5 — these apply Terraform against the new account. +Blocked on establishing the bootstrap credential convention (M1–M3). +Implement after the credential handoff pattern is agreed. + +### Phase 3 — IAM, VPC, app structure +Workspaces 6, 7, 8, 9, 10 — these are the most account-specific and depend on +secrets (LDAP, SAML metadata) and IPAM allocation. +The git-secret dependency (M6) is the largest blocker; see section 6 for a full +analysis. Retiring git-secret in favor of Vault KV (extending ADR-002) is a +prerequisite for full unattended automation of `common/`. With Vault KV in place, +the only remaining manual inputs for `common/` are SAML metadata and VPC CIDRs. + +--- + +## 12. Comparison to Current State + +| Aspect | Today (manual `INF.SETUP.md`) | With proposed sc-lambda-ghactions products | +|---|---|---| +| Repo creation | Manual `git init` + GitHub API | Automated — Workspace 0 PR + executor | +| Provider file generation | Hand-edit per account | Automated — Workspace 1 | +| Credentials file generation | Hand-edit per region | Automated — Workspace 2 | +| TF state bootstrap | Manual CLI commands | Automated — Workspace 4 | +| IAM roles/groups/users | Ordered manual `tf apply` per module | Automated — Workspace 6 TAG sequence | +| VPC | Manual `tf apply` | Automated — Workspace 8 (requires CIDR input) | +| Secrets management | git-secret (manual encrypt cycle) | Manual until ADR-002 | +| Time to first usable account | Days | Hours (Phase 1+2 only); minutes if Phase 3 secrets are available | +| Auditability | `git log` in account repo | PR per workspace in GitHub; CodeBuild logs; GHE commit status | +| Repeatability | Operator knowledge / `INF.SETUP.md` | SC product form fields; idempotent Proposer |