From 9f885150aa22a159877c079244fd8eb25b1eae6e Mon Sep 17 00:00:00 2001 From: Dave Arnold Date: Wed, 3 Jun 2026 13:57:25 -0400 Subject: [PATCH 1/4] docs: reject ADR-002 (Vault), withdraw ADR-003, unblock ADR-004 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ADR-002 (HashiCorp Vault AWS Secrets Engine) rejected after review with Matt Morgan. Key reasons: - CodeBuild already has an IAM role; direct sts:AssumeRole into a StackSet-provisioned target-account role is the correct pattern - StackSets auto-propagate trust to new accounts at vending time and remove it at decommission — no extra per-account onboarding step - Role assumption (no credential issuance) is strictly better security - Vault adds cluster infrastructure cost with no proportionate benefit - Note: OpenBao preferred over HashiCorp Vault if Vault is ever needed ADR-003 (vault cluster topology) withdrawn — depends on ADR-002. ADR-004 (sc-automation-codebuild-role via StackSet) confirmed as the final design; Vault dependency caveat removed. Jira: CSC-1345 → Done, CSC-1346 → Done, CSC-1344 → In Progress (unblocked) --- .../decisions/002-vault-aws-secrets-engine.md | 60 ++++++++++++++++++- docs/decisions/003-vault-cluster-topology.md | 19 +++++- .../004-account-baseline-iam-role.md | 23 ++----- 3 files changed, 81 insertions(+), 21 deletions(-) diff --git a/docs/decisions/002-vault-aws-secrets-engine.md b/docs/decisions/002-vault-aws-secrets-engine.md index 1d1bd4c..5546085 100644 --- a/docs/decisions/002-vault-aws-secrets-engine.md +++ b/docs/decisions/002-vault-aws-secrets-engine.md @@ -21,8 +21,64 @@ workspace, the exact IAM permissions granted to any automation run are visible as a reviewable diff in the same PR that makes the infrastructure change. Review the code, review the access policy — one approval covers both. -**Status:** Proposed -**Date:** 2026-05-19 +**Status:** Rejected +**Date:** 2026-05-19 +**Rejected:** 2026-06-03 + +--- + +## Rejection Decision (2026-06-03) + +After review with Matt Morgan this ADR was **rejected**. The direct IAM role assumption +approach described in ADR-004 is the chosen mechanism. The rationale: + +### Why Vault was rejected + +1. **CodeBuild already has an IAM role.** The CodeBuild service role is already an + AWS IAM principal. The correct AWS-native pattern is `sts:AssumeRole` directly from + that role into a pre-provisioned role in the target account — no credential issuance + step at all. Vault adds a retrieve-credentials hop that the platform does not need. + +2. **The StackSet propagation argument is decisive.** A `SERVICE_MANAGED` StackSet + targeting the org OU already propagates IAM roles to every account at vending time + and removes them at decommission — this is already how Census manages `r-inf-terraform` + and other cross-account baseline roles. Vault requires an *additional* per-account + onboarding step (granting the Vault IAM principal `sts:AssumeRole` rights), which + the StackSet approach does not. + +3. **Not handling credentials at all is better security than leasing them.** Role + assumption leaves no credential artifact to exfiltrate. Vault-issued STS keys + (even short-lived) are actual key/secret pairs that travel through the build + environment. + +4. **Extra infrastructure, extra cost, no proportionate gain.** Vault requires a + cluster (HA, patching, unseal key management, backup) in a GovCloud environment. + For cross-account access it provides no capability that direct `sts:AssumeRole` + does not already provide. + +5. **"I've already done it this way" is not a sufficient reason.** The Vault proposal + was partly motivated by existing tooling. That tooling cost should not drive a + cross-cutting architecture decision. + +### Note on OpenBao + +The suggestion to use **OpenBao** (the open-source Vault fork) over HashiCorp Vault +is acknowledged, but is moot given this rejection. If Vault/OpenBao is adopted for +another Census purpose (e.g., provider secrets, PKI), OpenBao is the preferred +variant — it is fully open-source and does not carry the BSL licensing constraint +introduced by HashiCorp in 2023. + +### What happens to CSC-1345 and CSC-1346 + +Both tickets are cancelled. ADR-003 (vault cluster topology, CSC-1346) is also +withdrawn. CSC-1344 (provision baseline IAM role via StackSet) is **unblocked**. + +--- + +## Original Proposal (archived for reference) + +> The sections below are the original "Proposed" content and are kept for historical +> context. They no longer represent the intended design. --- diff --git a/docs/decisions/003-vault-cluster-topology.md b/docs/decisions/003-vault-cluster-topology.md index 78fbd0c..5861cb0 100644 --- a/docs/decisions/003-vault-cluster-topology.md +++ b/docs/decisions/003-vault-cluster-topology.md @@ -9,13 +9,28 @@ and how CodeBuild builds will authenticate to it. This document records the topology decision: existing shared cluster vs. dedicated cluster, namespace layout, and the auth method CodeBuild will use to prove its identity to Vault. -**Status:** Proposed +**Status:** Withdrawn **Date:** 2026-05-28 -**Depends on:** ADR-002 (`002-vault-aws-secrets-engine.md`) +**Withdrawn:** 2026-06-03 +**Depends on:** ADR-002 (`002-vault-aws-secrets-engine.md`) — **which was rejected** **Jira:** [CSC-1346](https://jira.it.census.gov/browse/CSC-1346) --- +## Withdrawal Decision (2026-06-03) + +ADR-002 (Vault AWS Secrets Engine) was **rejected** on 2026-06-03 in favour of +direct IAM role assumption via CloudFormation StackSet (ADR-004). Because this ADR +has no purpose without ADR-002, it is **withdrawn**. + +CSC-1346 is cancelled. No Vault cluster decision is needed. + +--- + +## Original Proposal (archived for reference) + +--- + ## Context ADR-002 specifies that the CodeBuild executor will authenticate to Vault and request diff --git a/docs/decisions/004-account-baseline-iam-role.md b/docs/decisions/004-account-baseline-iam-role.md index bdb1fed..38758f7 100644 --- a/docs/decisions/004-account-baseline-iam-role.md +++ b/docs/decisions/004-account-baseline-iam-role.md @@ -12,11 +12,7 @@ across the org, and the lifecycle rules around updates and removal. **Status:** Accepted **Date:** 2026-05-28 -**Jira:** [CSC-1344](https://jira.it.census.gov/browse/CSC-1344) -**Note:** If ADR-002/ADR-003 are fully implemented (Vault AWS Secrets Engine), the -`sts:AssumeRole` trust in this role is eventually replaced by a Vault-issued -credential. This role definition remains the correct minimum-privilege baseline -regardless of which credential mechanism is used. +**Jira:** [CSC-1344](https://jira.it.census.gov/browse/CSC-1344) --- @@ -26,7 +22,7 @@ The executor CodeBuild build runs in csvd-dev. To apply Terraform changes in a target account (e.g. `123456789012-some-team-workload-dev-gov`), it must assume a role in that account. -### Current mechanism (static AssumeRole) +### Credential mechanism ``` csvd-dev CodeBuild role (229685449397) @@ -35,15 +31,9 @@ csvd-dev CodeBuild role (229685449397) └─ trusts 229685449397 CodeBuild role ``` -### Future mechanism (Vault dynamic credentials — ADR-002) - -``` -csvd-dev CodeBuild → vault login (IAM auth) → Vault AWS Secrets Engine - └─ Vault generates short-lived creds for sc-automation-codebuild-role -``` - -In both cases the **target-account role definition is the same** — only the -mechanism for obtaining credentials to it changes. +Direct `sts:AssumeRole` is the **final credential model**. ADR-002 (Vault AWS Secrets +Engine) was proposed as an alternative but was rejected on 2026-06-03 in favour of +this pattern. See [ADR-002](./002-vault-aws-secrets-engine.md) for the full rationale. --- @@ -158,7 +148,6 @@ mechanism if a Terraform account-vending pipeline is in place. ## Related -- [ADR-002: Vault AWS Secrets Engine](./002-vault-aws-secrets-engine.md) -- [ADR-003: Vault Cluster Topology](./003-vault-cluster-topology.md) +- [ADR-002: Vault AWS Secrets Engine (Rejected)](./002-vault-aws-secrets-engine.md) - [CSC-1344](https://jira.it.census.gov/browse/CSC-1344) — provisioning ticket - [CSC-1348](https://jira.it.census.gov/browse/CSC-1348) — OU sharing / StackSet ticket From 4e84069aa6aefedebb3ee7de6a64dfe9c5d8fef8 Mon Sep 17 00:00:00 2001 From: Dave Arnold Date: Thu, 4 Jun 2026 13:17:49 -0400 Subject: [PATCH 2/4] chore: end-of-session commit (3 files changed) --- deploy/stacksets.tf | 61 +++++++++++++++++++ deploy/variables.tf | 12 ++++ ...stackset-sc-automation-codebuild-role.yaml | 61 +++++++++++++++++++ 3 files changed, 134 insertions(+) create mode 100644 deploy/stacksets.tf create mode 100644 service-catalog/stackset-sc-automation-codebuild-role.yaml diff --git a/deploy/stacksets.tf b/deploy/stacksets.tf new file mode 100644 index 0000000..6ecd35a --- /dev/null +++ b/deploy/stacksets.tf @@ -0,0 +1,61 @@ +# --------------------------------------------------------------------------- +# CloudFormation StackSet: sc-automation-codebuild-role +# +# Deploys an IAM cross-account role to every account in the target OU so the +# tf-run-executor CodeBuild project in csvd-dev can assume it to run Terraform +# in each target account. +# +# PREREQUISITE: SERVICE_MANAGED StackSets can only be managed from the AWS +# Organizations management account OR a delegated CloudFormation StackSets +# administrator account. If csvd-dev (229685449397) is not one of those, apply +# this module with credentials for the management/delegated-admin account by +# either: +# - setting AWS_PROFILE to a management-account profile, or +# - adding `assume_role { role_arn = "..." }` to the provider in provider.tf. +# +# Auto-deployment is enabled: accounts joining the OU automatically receive the +# role. Accounts removed from the OU lose the role (retain = false). +# --------------------------------------------------------------------------- + +resource "aws_cloudformation_stack_set" "sc_automation_role" { + name = "sc-automation-codebuild-role" + description = "Deploys sc-automation-codebuild-role IAM role to all accounts in the target OU" + + permission_model = "SERVICE_MANAGED" + + auto_deployment { + enabled = true + retain_stacks_on_account_removal = false + } + + capabilities = ["CAPABILITY_NAMED_IAM"] + + # Template is rendered inline from the file so no S3 dependency is required. + template_body = file("${path.module}/../service-catalog/stackset-sc-automation-codebuild-role.yaml") + + parameters = { + CodeBuildAccountId = var.codebuild_account_id + } + + tags = { + Project = "sc-automation" + ManagedBy = "terraform" + } +} + +resource "aws_cloudformation_stack_set_instance" "sc_automation_role" { + stack_set_name = aws_cloudformation_stack_set.sc_automation_role.name + region = data.aws_region.current.name + + # Target the entire OU; CloudFormation resolves account membership dynamically. + deployment_targets { + organizational_unit_ids = [var.stackset_target_ou_id] + } + + # Allow up to 5 concurrent stack instance deployments; tolerate no failures + # so a bad account never silently goes un-deployed. + operation_preferences { + failure_tolerance_count = 0 + max_concurrent_count = 5 + } +} diff --git a/deploy/variables.tf b/deploy/variables.tf index 4558239..e93c9ce 100644 --- a/deploy/variables.tf +++ b/deploy/variables.tf @@ -80,3 +80,15 @@ variable "principal_arns" { type = list(string) default = [] } + +variable "stackset_target_ou_id" { + description = "AWS Organizations OU ID to target with the sc-automation-codebuild-role StackSet (e.g. \"ou-xxxx-xxxxxxxx\")" + type = string + # e.g. "ou-xxxx-xxxxxxxx" +} + +variable "codebuild_account_id" { + description = "AWS account ID of the csvd-dev account where tf-run-executor-codebuild runs; used as the trust principal in the cross-account role" + type = string + default = "229685449397" +} diff --git a/service-catalog/stackset-sc-automation-codebuild-role.yaml b/service-catalog/stackset-sc-automation-codebuild-role.yaml new file mode 100644 index 0000000..ab7d3b7 --- /dev/null +++ b/service-catalog/stackset-sc-automation-codebuild-role.yaml @@ -0,0 +1,61 @@ +AWSTemplateFormatVersion: "2010-09-09" +Description: > + Deploys sc-automation-codebuild-role in each target account via a SERVICE_MANAGED + CloudFormation StackSet. Every account in the target OU receives the role + automatically. New accounts joining the OU get the role on vending (auto_deployment + enabled). See ADR-004 in sc-lambda-ghactions for the full design rationale. + +Parameters: + CodeBuildAccountId: + Type: String + Default: "229685449397" + Description: > + AWS account ID of the csvd-dev account where the tf-run-executor CodeBuild + project runs. The IAM role tf-run-executor-codebuild in this account is + granted sts:AssumeRole on this target-account role. + AllowedPattern: "[0-9]{12}" + ConstraintDescription: Must be a 12-digit AWS account ID. + +Resources: + SCAutomationCodeBuildRole: + Type: AWS::IAM::Role + Properties: + RoleName: sc-automation-codebuild-role + Description: > + Cross-account role assumed by the tf-run-executor CodeBuild service role in + csvd-dev to run Terraform in this account. Deployed and managed via + CloudFormation StackSet (sc-lambda-ghactions). + AssumeRolePolicyDocument: + Version: "2012-10-17" + Statement: + - Sid: AllowSCAutomationCodeBuild + Effect: Allow + Principal: + AWS: !Sub "arn:${AWS::Partition}:iam::${CodeBuildAccountId}:role/tf-run-executor-codebuild" + Action: sts:AssumeRole + Condition: + StringEquals: + # ExternalId = the target account's own ID, preventing confused-deputy + # attacks across accounts in the same org. The executor build passes + # the target account ID as ExternalId at assume-role time. + sts:ExternalId: !Ref AWS::AccountId + ManagedPolicyArns: + # Initial baseline: AdministratorAccess. + # Future hardening: replace with a least-privilege customer-managed policy + # once the full set of IAM actions required by each product workspace is + # known. Track in a follow-up ADR. + - !Sub "arn:${AWS::Partition}:iam::aws:policy/AdministratorAccess" + Tags: + - Key: Project + Value: sc-automation + - Key: ManagedBy + Value: cloudformation-stackset + - Key: CreatedBy + Value: sc-lambda-ghactions + +Outputs: + RoleArn: + Description: ARN of the sc-automation-codebuild-role IAM role + Value: !GetAtt SCAutomationCodeBuildRole.Arn + Export: + Name: sc-automation-codebuild-role-arn From 040f5f9b5efe2243dac95468ca7109d8f53a8b3c Mon Sep 17 00:00:00 2001 From: Dave Arnold Date: Thu, 4 Jun 2026 14:04:59 -0400 Subject: [PATCH 3/4] chore: end-of-session commit (3 files changed) --- deploy/stacksets.tf | 61 ------------------- deploy/variables.tf | 12 ---- ...stackset-sc-automation-codebuild-role.yaml | 61 ------------------- 3 files changed, 134 deletions(-) delete mode 100644 deploy/stacksets.tf delete mode 100644 service-catalog/stackset-sc-automation-codebuild-role.yaml diff --git a/deploy/stacksets.tf b/deploy/stacksets.tf deleted file mode 100644 index 6ecd35a..0000000 --- a/deploy/stacksets.tf +++ /dev/null @@ -1,61 +0,0 @@ -# --------------------------------------------------------------------------- -# CloudFormation StackSet: sc-automation-codebuild-role -# -# Deploys an IAM cross-account role to every account in the target OU so the -# tf-run-executor CodeBuild project in csvd-dev can assume it to run Terraform -# in each target account. -# -# PREREQUISITE: SERVICE_MANAGED StackSets can only be managed from the AWS -# Organizations management account OR a delegated CloudFormation StackSets -# administrator account. If csvd-dev (229685449397) is not one of those, apply -# this module with credentials for the management/delegated-admin account by -# either: -# - setting AWS_PROFILE to a management-account profile, or -# - adding `assume_role { role_arn = "..." }` to the provider in provider.tf. -# -# Auto-deployment is enabled: accounts joining the OU automatically receive the -# role. Accounts removed from the OU lose the role (retain = false). -# --------------------------------------------------------------------------- - -resource "aws_cloudformation_stack_set" "sc_automation_role" { - name = "sc-automation-codebuild-role" - description = "Deploys sc-automation-codebuild-role IAM role to all accounts in the target OU" - - permission_model = "SERVICE_MANAGED" - - auto_deployment { - enabled = true - retain_stacks_on_account_removal = false - } - - capabilities = ["CAPABILITY_NAMED_IAM"] - - # Template is rendered inline from the file so no S3 dependency is required. - template_body = file("${path.module}/../service-catalog/stackset-sc-automation-codebuild-role.yaml") - - parameters = { - CodeBuildAccountId = var.codebuild_account_id - } - - tags = { - Project = "sc-automation" - ManagedBy = "terraform" - } -} - -resource "aws_cloudformation_stack_set_instance" "sc_automation_role" { - stack_set_name = aws_cloudformation_stack_set.sc_automation_role.name - region = data.aws_region.current.name - - # Target the entire OU; CloudFormation resolves account membership dynamically. - deployment_targets { - organizational_unit_ids = [var.stackset_target_ou_id] - } - - # Allow up to 5 concurrent stack instance deployments; tolerate no failures - # so a bad account never silently goes un-deployed. - operation_preferences { - failure_tolerance_count = 0 - max_concurrent_count = 5 - } -} diff --git a/deploy/variables.tf b/deploy/variables.tf index e93c9ce..4558239 100644 --- a/deploy/variables.tf +++ b/deploy/variables.tf @@ -80,15 +80,3 @@ variable "principal_arns" { type = list(string) default = [] } - -variable "stackset_target_ou_id" { - description = "AWS Organizations OU ID to target with the sc-automation-codebuild-role StackSet (e.g. \"ou-xxxx-xxxxxxxx\")" - type = string - # e.g. "ou-xxxx-xxxxxxxx" -} - -variable "codebuild_account_id" { - description = "AWS account ID of the csvd-dev account where tf-run-executor-codebuild runs; used as the trust principal in the cross-account role" - type = string - default = "229685449397" -} diff --git a/service-catalog/stackset-sc-automation-codebuild-role.yaml b/service-catalog/stackset-sc-automation-codebuild-role.yaml deleted file mode 100644 index ab7d3b7..0000000 --- a/service-catalog/stackset-sc-automation-codebuild-role.yaml +++ /dev/null @@ -1,61 +0,0 @@ -AWSTemplateFormatVersion: "2010-09-09" -Description: > - Deploys sc-automation-codebuild-role in each target account via a SERVICE_MANAGED - CloudFormation StackSet. Every account in the target OU receives the role - automatically. New accounts joining the OU get the role on vending (auto_deployment - enabled). See ADR-004 in sc-lambda-ghactions for the full design rationale. - -Parameters: - CodeBuildAccountId: - Type: String - Default: "229685449397" - Description: > - AWS account ID of the csvd-dev account where the tf-run-executor CodeBuild - project runs. The IAM role tf-run-executor-codebuild in this account is - granted sts:AssumeRole on this target-account role. - AllowedPattern: "[0-9]{12}" - ConstraintDescription: Must be a 12-digit AWS account ID. - -Resources: - SCAutomationCodeBuildRole: - Type: AWS::IAM::Role - Properties: - RoleName: sc-automation-codebuild-role - Description: > - Cross-account role assumed by the tf-run-executor CodeBuild service role in - csvd-dev to run Terraform in this account. Deployed and managed via - CloudFormation StackSet (sc-lambda-ghactions). - AssumeRolePolicyDocument: - Version: "2012-10-17" - Statement: - - Sid: AllowSCAutomationCodeBuild - Effect: Allow - Principal: - AWS: !Sub "arn:${AWS::Partition}:iam::${CodeBuildAccountId}:role/tf-run-executor-codebuild" - Action: sts:AssumeRole - Condition: - StringEquals: - # ExternalId = the target account's own ID, preventing confused-deputy - # attacks across accounts in the same org. The executor build passes - # the target account ID as ExternalId at assume-role time. - sts:ExternalId: !Ref AWS::AccountId - ManagedPolicyArns: - # Initial baseline: AdministratorAccess. - # Future hardening: replace with a least-privilege customer-managed policy - # once the full set of IAM actions required by each product workspace is - # known. Track in a follow-up ADR. - - !Sub "arn:${AWS::Partition}:iam::aws:policy/AdministratorAccess" - Tags: - - Key: Project - Value: sc-automation - - Key: ManagedBy - Value: cloudformation-stackset - - Key: CreatedBy - Value: sc-lambda-ghactions - -Outputs: - RoleArn: - Description: ARN of the sc-automation-codebuild-role IAM role - Value: !GetAtt SCAutomationCodeBuildRole.Arn - Export: - Name: sc-automation-codebuild-role-arn From 5e14547e55f243d89cfd3ee1d034cf8fb2a2c4e8 Mon Sep 17 00:00:00 2001 From: Dave Arnold Date: Mon, 8 Jun 2026 15:41:04 -0400 Subject: [PATCH 4/4] feat(CSVDIES-9980): pass ExternalId at assume-role; default to sc-automation-codebuild-role MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related changes to wire the executor to the new cross-account role deployed by the terraform-service-catalog-census StackSet (PR #13): 1. CROSS_ACCOUNT_ROLE default changed from r-inf-terraform to sc-automation-codebuild-role — the new purpose-built role for this automation system, deployed org-wide via CFN StackSet. 2. --external-id "${TARGET_ACCOUNT_ID}" added to the aws sts assume-role call — required by the ExternalId condition on sc-automation-codebuild-role (sts:ExternalId = AWS::AccountId) per ADR-004 confused-deputy protection. The r-inf-terraform role can still be used by passing CROSS_ACCOUNT_ROLE=r-inf-terraform as an env var override; it is not removed from the CodeBuild IAM policy. See ADR-004: docs/decisions/004-account-baseline-iam-role.md Jira: CSVDIES-9980 --- buildspec-executor.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/buildspec-executor.yml b/buildspec-executor.yml index 8973fa3..a6fb2e5 100644 --- a/buildspec-executor.yml +++ b/buildspec-executor.yml @@ -34,7 +34,7 @@ env: NO_PROXY: "github.e.it.census.gov,169.254.169.254,169.254.170.2" # Per-build defaults (overridden via environmentVariablesOverride in Lambda) TARGET_ACCOUNT_ID: "" - CROSS_ACCOUNT_ROLE: "r-inf-terraform" + CROSS_ACCOUNT_ROLE: "sc-automation-codebuild-role" TF_RUN_START_TAG: "" DRY_RUN: "false" @@ -113,6 +113,7 @@ phases: CREDS=$(aws sts assume-role \ --role-arn "${ROLE_ARN}" \ --role-session-name "sc-automation-${ACCOUNT_REPO}" \ + --external-id "${TARGET_ACCOUNT_ID}" \ --query Credentials \ --output json) export AWS_ACCESS_KEY_ID=$(echo "$CREDS" | python3 -c "import json,sys; print(json.load(sys.stdin)['AccessKeyId'])")