Skip to content

Commit

Permalink
docs: reject ADR-002 (Vault), withdraw ADR-003, unblock ADR-004
Browse files Browse the repository at this point in the history
ADR-002 (HashiCorp Vault AWS Secrets Engine) rejected after review with
Matt Morgan. Key reasons:
- CodeBuild already has an IAM role; direct sts:AssumeRole into a
  StackSet-provisioned target-account role is the correct pattern
- StackSets auto-propagate trust to new accounts at vending time and
  remove it at decommission — no extra per-account onboarding step
- Role assumption (no credential issuance) is strictly better security
- Vault adds cluster infrastructure cost with no proportionate benefit
- Note: OpenBao preferred over HashiCorp Vault if Vault is ever needed

ADR-003 (vault cluster topology) withdrawn — depends on ADR-002.
ADR-004 (sc-automation-codebuild-role via StackSet) confirmed as the
final design; Vault dependency caveat removed.

Jira: CSC-1345 → Done, CSC-1346 → Done, CSC-1344 → In Progress (unblocked)
  • Loading branch information
Dave Arnold committed Jun 3, 2026
1 parent 4b32072 commit 9f88515
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 21 deletions.
60 changes: 58 additions & 2 deletions docs/decisions/002-vault-aws-secrets-engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,64 @@ workspace, the exact IAM permissions granted to any automation run are visible
as a reviewable diff in the same PR that makes the infrastructure change. Review
the code, review the access policy — one approval covers both.

**Status:** Proposed
**Date:** 2026-05-19
**Status:** Rejected
**Date:** 2026-05-19
**Rejected:** 2026-06-03

---

## Rejection Decision (2026-06-03)

After review with Matt Morgan this ADR was **rejected**. The direct IAM role assumption
approach described in ADR-004 is the chosen mechanism. The rationale:

### Why Vault was rejected

1. **CodeBuild already has an IAM role.** The CodeBuild service role is already an
AWS IAM principal. The correct AWS-native pattern is `sts:AssumeRole` directly from
that role into a pre-provisioned role in the target account — no credential issuance
step at all. Vault adds a retrieve-credentials hop that the platform does not need.

2. **The StackSet propagation argument is decisive.** A `SERVICE_MANAGED` StackSet
targeting the org OU already propagates IAM roles to every account at vending time
and removes them at decommission — this is already how Census manages `r-inf-terraform`
and other cross-account baseline roles. Vault requires an *additional* per-account
onboarding step (granting the Vault IAM principal `sts:AssumeRole` rights), which
the StackSet approach does not.

3. **Not handling credentials at all is better security than leasing them.** Role
assumption leaves no credential artifact to exfiltrate. Vault-issued STS keys
(even short-lived) are actual key/secret pairs that travel through the build
environment.

4. **Extra infrastructure, extra cost, no proportionate gain.** Vault requires a
cluster (HA, patching, unseal key management, backup) in a GovCloud environment.
For cross-account access it provides no capability that direct `sts:AssumeRole`
does not already provide.

5. **"I've already done it this way" is not a sufficient reason.** The Vault proposal
was partly motivated by existing tooling. That tooling cost should not drive a
cross-cutting architecture decision.

### Note on OpenBao

The suggestion to use **OpenBao** (the open-source Vault fork) over HashiCorp Vault
is acknowledged, but is moot given this rejection. If Vault/OpenBao is adopted for
another Census purpose (e.g., provider secrets, PKI), OpenBao is the preferred
variant — it is fully open-source and does not carry the BSL licensing constraint
introduced by HashiCorp in 2023.

### What happens to CSC-1345 and CSC-1346

Both tickets are cancelled. ADR-003 (vault cluster topology, CSC-1346) is also
withdrawn. CSC-1344 (provision baseline IAM role via StackSet) is **unblocked**.

---

## Original Proposal (archived for reference)

> The sections below are the original "Proposed" content and are kept for historical
> context. They no longer represent the intended design.
---

Expand Down
19 changes: 17 additions & 2 deletions docs/decisions/003-vault-cluster-topology.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,28 @@ and how CodeBuild builds will authenticate to it.
This document records the topology decision: existing shared cluster vs. dedicated cluster,
namespace layout, and the auth method CodeBuild will use to prove its identity to Vault.

**Status:** Proposed
**Status:** Withdrawn
**Date:** 2026-05-28
**Depends on:** ADR-002 (`002-vault-aws-secrets-engine.md`)
**Withdrawn:** 2026-06-03
**Depends on:** ADR-002 (`002-vault-aws-secrets-engine.md`) — **which was rejected**
**Jira:** [CSC-1346](https://jira.it.census.gov/browse/CSC-1346)

---

## Withdrawal Decision (2026-06-03)

ADR-002 (Vault AWS Secrets Engine) was **rejected** on 2026-06-03 in favour of
direct IAM role assumption via CloudFormation StackSet (ADR-004). Because this ADR
has no purpose without ADR-002, it is **withdrawn**.

CSC-1346 is cancelled. No Vault cluster decision is needed.

---

## Original Proposal (archived for reference)

---

## Context

ADR-002 specifies that the CodeBuild executor will authenticate to Vault and request
Expand Down
23 changes: 6 additions & 17 deletions docs/decisions/004-account-baseline-iam-role.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,7 @@ across the org, and the lifecycle rules around updates and removal.

**Status:** Accepted
**Date:** 2026-05-28
**Jira:** [CSC-1344](https://jira.it.census.gov/browse/CSC-1344)
**Note:** If ADR-002/ADR-003 are fully implemented (Vault AWS Secrets Engine), the
`sts:AssumeRole` trust in this role is eventually replaced by a Vault-issued
credential. This role definition remains the correct minimum-privilege baseline
regardless of which credential mechanism is used.
**Jira:** [CSC-1344](https://jira.it.census.gov/browse/CSC-1344)

---

Expand All @@ -26,7 +22,7 @@ The executor CodeBuild build runs in csvd-dev. To apply Terraform changes in a
target account (e.g. `123456789012-some-team-workload-dev-gov`), it must assume
a role in that account.

### Current mechanism (static AssumeRole)
### Credential mechanism

```
csvd-dev CodeBuild role (229685449397)
Expand All @@ -35,15 +31,9 @@ csvd-dev CodeBuild role (229685449397)
└─ trusts 229685449397 CodeBuild role
```

### Future mechanism (Vault dynamic credentials — ADR-002)

```
csvd-dev CodeBuild → vault login (IAM auth) → Vault AWS Secrets Engine
└─ Vault generates short-lived creds for sc-automation-codebuild-role
```

In both cases the **target-account role definition is the same** — only the
mechanism for obtaining credentials to it changes.
Direct `sts:AssumeRole` is the **final credential model**. ADR-002 (Vault AWS Secrets
Engine) was proposed as an alternative but was rejected on 2026-06-03 in favour of
this pattern. See [ADR-002](./002-vault-aws-secrets-engine.md) for the full rationale.

---

Expand Down Expand Up @@ -158,7 +148,6 @@ mechanism if a Terraform account-vending pipeline is in place.

## Related

- [ADR-002: Vault AWS Secrets Engine](./002-vault-aws-secrets-engine.md)
- [ADR-003: Vault Cluster Topology](./003-vault-cluster-topology.md)
- [ADR-002: Vault AWS Secrets Engine (Rejected)](./002-vault-aws-secrets-engine.md)
- [CSC-1344](https://jira.it.census.gov/browse/CSC-1344) — provisioning ticket
- [CSC-1348](https://jira.it.census.gov/browse/CSC-1348) — OU sharing / StackSet ticket

0 comments on commit 9f88515

Please sign in to comment.