-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: update CHECKPOINT with Jira sub-tasks and add decision document…
…s index; create ADR-003 for Vault cluster topology; create ADR-004 for account baseline IAM role; create ADR-005 for Service Catalog portfolio sharing strategy
- Loading branch information
Dave Arnold
committed
May 28, 2026
1 parent
b14a084
commit 66838cf
Showing
5 changed files
with
516 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| # ADR-003: Vault Cluster Topology for SC Automation | ||
|
|
||
| ## In Plain Language | ||
|
|
||
| Before we can implement ADR-002 (dynamic AWS credentials from Vault), we need to decide | ||
| *which* Vault cluster the SC automation system will talk to, how that cluster is organized, | ||
| and how CodeBuild builds will authenticate to it. | ||
|
|
||
| This document records the topology decision: existing shared cluster vs. dedicated cluster, | ||
| namespace layout, and the auth method CodeBuild will use to prove its identity to Vault. | ||
|
|
||
| **Status:** Proposed | ||
| **Date:** 2026-05-28 | ||
| **Depends on:** ADR-002 (`002-vault-aws-secrets-engine.md`) | ||
| **Jira:** [CSC-1346](https://jira.it.census.gov/browse/CSC-1346) | ||
|
|
||
| --- | ||
|
|
||
| ## Context | ||
|
|
||
| ADR-002 specifies that the CodeBuild executor will authenticate to Vault and request | ||
| short-lived AWS credentials from the Vault AWS Secrets Engine. But it deliberately | ||
| defers the question of *which* Vault cluster to use. Three viable topologies exist: | ||
|
|
||
| ### Option A — Shared Census Vault cluster, dedicated namespace | ||
|
|
||
| Use an existing Census-managed Vault cluster (e.g. the platform Vault in csvd-prod | ||
| or a shared non-prod instance). Create a dedicated namespace (`sc-automation/`) so | ||
| that all SC automation policies, roles, and secrets engine mounts are isolated from | ||
| other tenants. | ||
|
|
||
| **Pros:** | ||
| - No new cluster to operate or HA-tune | ||
| - Shared cluster is already monitored, patched, and backed up | ||
| - Cost is shared across all tenants | ||
|
|
||
| **Cons:** | ||
| - Dependency on another team's change-management cadence | ||
| - Namespace-level isolation is good but not complete cluster isolation | ||
| - Shared cluster outage affects all tenants simultaneously | ||
|
|
||
| ### Option B — Dedicated Vault cluster in csvd-dev | ||
|
|
||
| Deploy a standalone Vault cluster (Integrated Storage / Raft, 3-node) in csvd-dev | ||
| `us-gov-west-1` specifically for SC automation. | ||
|
|
||
| **Pros:** | ||
| - Full operational control; can tune lease TTLs, auth policies, and HA config | ||
| without coordinating with other teams | ||
| - Complete isolation — a misconfiguration in SC automation cannot affect other workloads | ||
| - Can be versioned and upgraded on our own schedule | ||
|
|
||
| **Cons:** | ||
| - New operational burden: cluster patching, unseal key rotation, backup scheduling | ||
| - Requires 3 EC2 instances (or ECS tasks) and associated IAM/networking | ||
| - Higher cost for a single-tenant cluster | ||
|
|
||
| ### Option C — Vault on Kubernetes (ECS/EKS sidecar pattern) | ||
|
|
||
| Run Vault as a sidecar container alongside CodeBuild tasks (dev/agent pattern), using | ||
| `vault agent` injector to deliver credentials to the build environment. | ||
|
|
||
| **Pros:** No persistent cluster to manage | ||
| **Cons:** CodeBuild does not support sidecars natively; requires workaround; substantially | ||
| more complex than Options A or B. **Not recommended.** | ||
|
|
||
| --- | ||
|
|
||
| ## Auth Method Decision | ||
|
|
||
| Regardless of cluster topology, CodeBuild will authenticate to Vault using the | ||
| **AWS IAM auth method** (`auth/aws`). The CodeBuild service role ARN | ||
| (`arn:${AWS::Partition}:iam::229685449397:role/sc-automation-codebuild-role`) is | ||
| bound to a Vault role. When the executor build starts, `vault login` presents the | ||
| current IAM identity (via `GetCallerIdentity`) — no static tokens or secrets are | ||
| needed inside the build environment. | ||
|
|
||
| ```hcl | ||
| # Vault IAM auth role (managed in sc-lambda-ghactions deploy/) | ||
| resource "vault_aws_auth_backend_role" "codebuild_executor" { | ||
| backend = "aws" | ||
| role = "sc-automation-codebuild" | ||
| auth_type = "iam" | ||
| bound_iam_principal_arns = ["arn:aws-us-gov:iam::229685449397:role/sc-automation-codebuild-role"] | ||
| token_ttl = 900 # 15 min — matches max CodeBuild build window | ||
| token_policies = ["sc-automation-executor"] | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Decision | ||
|
|
||
| > **TO BE DECIDED** — this ADR is in Proposed state pending discussion with the | ||
| > platform / Vault operations team. | ||
| Questions to answer before closing this ADR: | ||
|
|
||
| 1. Is there an existing Census Vault cluster available for non-prod workloads that | ||
| the SC automation team can use? What is its SLA? | ||
| 2. Does the Census Vault team support dedicated namespaces for product teams? | ||
| 3. What is the blast-radius / approval process for cluster-level changes on a | ||
| shared cluster that affect us? | ||
| 4. Are there cost / account placement constraints that favour one topology? | ||
|
|
||
| **Recommended default (pending discussion): Option A** — shared Census cluster with | ||
| a dedicated `sc-automation/` namespace. This avoids new operational burden while | ||
| still providing tenant isolation. Revisit if the shared cluster proves too slow to | ||
| change or if an outage directly impacts SC automation SLA. | ||
|
|
||
| --- | ||
|
|
||
| ## Consequences | ||
|
|
||
| ### If Option A (shared cluster, dedicated namespace) | ||
|
|
||
| - Platform team must grant namespace admin rights to the SC automation team | ||
| - SC automation `deploy/` Terraform must include Vault provider config pointing at | ||
| the shared cluster | ||
| - Vault cluster URL and namespace become required Terraform variables | ||
|
|
||
| ### If Option B (dedicated cluster in csvd-dev) | ||
|
|
||
| - New Terraform module required to stand up 3-node Raft cluster in csvd-dev | ||
| - Unseal key escrow procedure must be documented and tested | ||
| - Adds ~$X/month to csvd-dev bill (to be estimated) | ||
|
|
||
| --- | ||
|
|
||
| ## Related | ||
|
|
||
| - [ADR-002: Vault AWS Secrets Engine](./002-vault-aws-secrets-engine.md) — upstream decision | ||
| - [CSC-1345](https://jira.it.census.gov/browse/CSC-1345) — ADR-002 implementation ticket | ||
| - [CSC-1346](https://jira.it.census.gov/browse/CSC-1346) — this topology decision ticket |
Oops, something went wrong.