From 4b3207274b976ca0ae863c9ceaf1e33c3cf60462 Mon Sep 17 00:00:00 2001 From: Dave Arnold Date: Tue, 2 Jun 2026 16:21:57 -0400 Subject: [PATCH] docs: add Vault AWS Secrets Engine sales presentation Internal deck covering problem statement, architecture, security benefits (NIST 800-53), government/compliance considerations (BSL 1.1, OpenBao, FIPS 140-2), phased roadmap, and call to action. Jira: CSC-1345 CSC-1346 --- docs/vault-aws-secrets-engine.md | 464 +++++++++++++++++++++++++++++++ 1 file changed, 464 insertions(+) create mode 100644 docs/vault-aws-secrets-engine.md diff --git a/docs/vault-aws-secrets-engine.md b/docs/vault-aws-secrets-engine.md new file mode 100644 index 0000000..af67caf --- /dev/null +++ b/docs/vault-aws-secrets-engine.md @@ -0,0 +1,464 @@ +# HashiCorp Vault for Cross-Account Automation at Census +**Audience:** CSVD Engineering / sc-lambda-ghactions Stakeholders +**Date:** June 2026 +**Author:** David Arnold (`arnol377`) +**Related Jira:** [CSC-1345](https://jira.it.census.gov/browse/CSC-1345) · [CSC-1346](https://jira.it.census.gov/browse/CSC-1346) + +--- + +## 1. The Problem + +The `sc-lambda-ghactions` system automates AWS Service Catalog provisioning by running +`tf-run apply` via CodeBuild. To do that across multiple AWS accounts, CodeBuild needs +temporary credentials for each target account. + +**Current state:** no cross-account credential mechanism exists. + +**The naive fix (and why we rejected it):** +Add a trust policy to `r-inf-terraform` in every account that allows the CodeBuild IAM +role from `csvd-dev` to assume it. This requires: + +- A change to the management account StackSet `allow_assume_role_tf` parameter +- Trust policy propagation to **every org account** — ~450+ and growing +- Each new account onboarded requires the trust to already be in place +- Long-lived STS sessions (up to 1 hour) with no per-use audit trail + +**The right fix:** Vault AWS Secrets Engine. + +--- + +## 2. What Is HashiCorp Vault? + +Vault is a secrets management platform that controls access to tokens, passwords, +certificates, and cloud credentials. Its core value propositions are: + +| Capability | What It Means | +|---|---| +| **Dynamic Secrets** | Credentials generated on demand, expire automatically | +| **Centralized Policy** | One policy engine controls access across all secret types | +| **Audit Log** | Every read, write, and auth event logged with identity + metadata | +| **Identity-Based Access** | "Who are you?" not "What password do you know?" | +| **Encryption as a Service** | Encrypt/decrypt data without exposing keys | + +For our immediate use case, we care about two specific features: +- **AWS Secrets Engine** — generates dynamic IAM credentials per target account +- **IAM Auth Method** — lets CodeBuild authenticate using its own AWS IAM identity (no static creds) + +--- + +## 3. How It Solves the Cross-Account Problem + +``` +CodeBuild (csvd-dev, tf-run-executor-codebuild role) + │ + │ 1. "I am tf-run-executor-codebuild in 229685449397" + │ (signed by AWS STS — no password, no token) + ▼ +Vault Server (IAM Auth Method) + │ + │ 2. Validates identity via AWS STS GetCallerIdentity + │ 3. Checks policy: "executor role may request adsd-dev creds" + │ 4. Generates short-lived IAM key pair for target account + ▼ +CodeBuild receives: + AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (TTL: 15 min) + │ + ▼ +tf-run apply runs in target account +Credentials expire automatically — nothing to rotate, nothing to leak +``` + +### What changes vs. the StackSet approach + +| Concern | StackSet Trust | Vault | +|---|---|---| +| Per-account setup | Trust policy in every account | Vault AWS backend role per account | +| New account onboarding | StackSet propagation (slow, blast radius) | Add one Vault role (seconds) | +| Credential lifetime | STS session: up to 1 hour | Configurable: 15 min recommended | +| Audit trail | CloudTrail (account-level) | Vault audit log (every access, centralized) | +| Revocation | Cannot revoke active STS session | Vault can revoke any lease instantly | +| Policy changes | StackSet → CloudFormation → IAM (slow) | `vault policy write` (instant) | + +--- + +## 4. Security Benefits + +### 4.1 Dynamic Credentials — Nothing to Rotate + +Static IAM access keys are a top attack vector (OWASP A02: Cryptographic Failures / +misconfigured credentials). With Vault: + +- No long-lived keys stored anywhere — not in Parameter Store, not in environment variables +- Every invocation gets a **unique, time-limited key pair** +- Expiry is enforced by Vault, not by developer discipline +- Compromise of one set of credentials is contained to a 15-minute window and one build job + +### 4.2 Every Access Is Audited + +Vault's audit device logs every auth event, secret read, and policy check with: + +```json +{ + "time": "2026-06-02T14:32:01Z", + "type": "response", + "auth": { + "client_token": "...", + "accessor": "...", + "display_name": "aws-tf-run-executor-codebuild", + "policies": ["default", "sc-automation-executor"], + "metadata": { + "account_id": "229685449397", + "iam_principal_arn": "arn:aws-us-gov:iam::229685449397:role/tf-run-executor-codebuild" + } + }, + "request": { + "path": "aws/creds/adsd-dev", + "operation": "read" + } +} +``` + +This gives you a **complete, tamper-evident record** of which automation job +requested credentials for which account, at what time — satisfying NIST 800-53 +AU-2, AU-3, AU-9 audit requirements. + +### 4.3 Principle of Least Privilege — Enforced Centrally + +Vault policies are written in HCL and version-controlled. Access to any given +account's credentials requires an explicit policy grant: + +```hcl +# Only allow executor to request creds for accounts it's authorized for +path "aws/creds/adsd-*" { + capabilities = ["read"] +} + +path "aws/creds/csvd-*" { + capabilities = ["read"] +} + +# Deny everything else explicitly +path "aws/*" { + capabilities = ["deny"] +} +``` + +No IAM policy sprawl, no StackSet blast radius. One file, version-controlled, +reviewed like any other code change. + +### 4.4 Break-Glass Revocation + +If a CodeBuild build is compromised mid-run, a Vault admin can: + +```bash +vault lease revoke -prefix aws/creds/adsd-dev +``` + +All active credentials for that backend are instantly invalidated — faster than +rotating an IAM key pair manually. + +### 4.5 Alignment with NIST 800-53 Controls + +| NIST Control | Requirement | Vault Feature | +|---|---|---| +| **IA-5** | Authenticator Management — no long-lived passwords | Dynamic secrets, auto-expiry | +| **AC-3** | Access Enforcement | Policy engine per path | +| **AC-17** | Remote Access | IAM Auth — cryptographic identity | +| **AU-2/3/9** | Audit Events, Content, Protection | Audit devices, tamper-evident log | +| **SC-12** | Cryptographic Key Establishment | Transit Secrets Engine (if needed later) | +| **CM-6** | Configuration Settings | Policies in version control | + +--- + +## 5. Automation Benefits + +### 5.1 Zero-Touch Account Onboarding + +When a new AWS account is bootstrapped, the only Vault step is: + +```bash +vault write aws/roles/new-account-name \ + credential_type=iam_user \ + policy_arns="arn:aws-us-gov:iam::aws:policy/AdministratorAccess" + +vault policy write sc-executor-new-account - << EOF +path "aws/creds/new-account-name" { capabilities = ["read"] } +EOF +``` + +Two commands. No StackSet, no CFN stack, no trust policy update. The executor +immediately has the ability to provision into that account. + +### 5.2 CodeBuild Integration Is Simple + +In `buildspec-executor.yml`, the existing `sts:AssumeRole` block becomes: + +```yaml +pre_build: + commands: + - | + if [ -n "$TARGET_ACCOUNT_ID" ]; then + # Authenticate to Vault using this CodeBuild job's IAM identity + VAULT_TOKEN=$(vault write -field=token auth/aws/login \ + role="sc-automation-executor" \ + iam_http_request_method="POST" \ + iam_request_url="$(base64 <<< 'https://sts.us-gov-west-1.amazonaws.com/')" \ + iam_request_body="$(base64 <<< 'Action=GetCallerIdentity&Version=2011-06-15')" \ + iam_request_headers="$(vault-aws-auth-header)") + + # Request short-lived credentials for the target account + CREDS=$(vault read -format=json aws/creds/${CROSS_ACCOUNT_ROLE}) + export AWS_ACCESS_KEY_ID=$(echo $CREDS | jq -r '.data.access_key') + export AWS_SECRET_ACCESS_KEY=$(echo $CREDS | jq -r '.data.secret_key') + export AWS_SESSION_TOKEN=$(echo $CREDS | jq -r '.data.security_token') + fi +``` + +No secrets stored in environment variables. No secrets in SSM Parameter Store. +The only trust relationship needed is between CodeBuild's IAM role and Vault's +IAM auth endpoint — which is a single Vault config entry, not a per-account change. + +### 5.3 The `cross_account_role` Field Is Already Wired + +The `sc-lambda-ghactions` system already passes `CROSS_ACCOUNT_ROLE` from the +`.sc-automation.yml` file through to CodeBuild. Vault just becomes the consumer +of that field name — no CFN template changes, no Lambda changes needed. + +--- + +## 6. Government / Compliance Considerations + +> **Important:** This section is critical for Census. Read before approving deployment. + +### 6.1 Vault OSS License: Business Source License (BSL 1.1) + +In August 2023, HashiCorp changed Vault (and Terraform) from MPL 2.0 to +**Business Source License (BSL) 1.1**. + +**Key BSL terms:** +- Free to use for **any internal purpose**, including government automation +- Restriction applies only to building a **competing product** (a commercial secrets + management service sold to others) +- After **4 years**, the code converts to MPL 2.0 automatically +- No per-seat or per-server fees for self-hosted OSS usage + +**For Census Bureau use:** ✅ BSL is acceptable. Census is not building a competing +commercial secrets management product. Using Vault to automate internal AWS +infrastructure is squarely within permitted BSL use. + +**However:** Legal should formally bless this before production deployment, as BSL +is a relatively new license and some agencies have blanket policies against non-OSI-approved +licenses (BSL is **not** OSI-approved). + +### 6.2 OpenBao — The True OSS Alternative + +If BSL creates a legal / policy blocker, **OpenBao** is a drop-in replacement: + +| | Vault OSS (BSL) | OpenBao | +|---|---|---| +| License | BSL 1.1 (not OSI-approved) | **MPL 2.0** (OSI-approved) | +| Fork basis | — | Vault 1.14.x | +| API compatibility | — | 100% compatible | +| Governance | IBM/HashiCorp | Linux Foundation | +| FIPS build | Enterprise only | Community FIPS build available | +| Support | HashiCorp Enterprise contract | Community + vendors | + +OpenBao is the recommended path if legal flags BSL. The implementation is identical — +same API, same SDK, same `vault` CLI commands. + +### 6.3 FIPS 140-2 / 140-3 Requirement + +Federal systems processing sensitive data must use **FIPS 140-2 validated cryptographic +modules** (NIST SP 800-131A, OMB M-19-17). + +| Build | FIPS Status | +|---|---| +| Vault OSS | ❌ No FIPS-validated modules | +| Vault Enterprise + FIPS build | ✅ FIPS 140-2 validated (NSS/BoringCrypto) | +| OpenBao (FIPS build) | ✅ FIPS 140-2 via Go FIPS fork (non-validated) | +| OpenBao + BoringCrypto | 🔄 Community working on validated build | + +**Recommendation:** +- **Non-production / dev:** Vault OSS or OpenBao standard build is fine +- **Production ATO:** Vault Enterprise with FIPS build, OR work with your ISSO to + determine if OpenBao's BoringCrypto build satisfies the ATO boundary + +### 6.4 FedRAMP + +**FedRAMP is for cloud service providers (CSPs), not agencies.** +Census Bureau does not need Vault itself to be FedRAMP authorized. You need: + +1. Vault to run on **FedRAMP-authorized infrastructure** → ✅ AWS GovCloud (us-gov-west-1) + is FedRAMP High authorized +2. Vault to be included in the **system boundary** of an agency ATO +3. The ISSO and AO to authorize Vault as a software component under FISMA + +**HCP Vault Dedicated** (HashiCorp's cloud-hosted Vault) does hold FedRAMP Moderate +authorization — but that is a separate product and would require routing traffic to +HashiCorp's infrastructure, which may not be acceptable for GovCloud workloads. + +**Recommended path:** Self-hosted Vault OSS/Enterprise on AWS GovCloud, included in +the existing Census ATO boundary. This is the same pattern used by other FISMA-High +federal agencies running Vault on GovCloud. + +### 6.5 Compliance Summary + +| Requirement | Vault OSS | Vault Enterprise | OpenBao | Notes | +|---|---|---|---|---| +| BSL license | ✅ internal use OK | ✅ | ✅ MPL 2.0 | Legal sign-off needed for BSL | +| FIPS 140-2 | ❌ | ✅ FIPS build | 🔄 in progress | Required for production ATO | +| FedRAMP (self-hosted) | ✅ via agency ATO | ✅ via agency ATO | ✅ | Not a Vault property — agency ATO | +| AWS GovCloud compatible | ✅ | ✅ | ✅ | Runs on any compute | +| NIST 800-53 audit controls | ✅ audit log | ✅ | ✅ | All builds have audit devices | +| No long-lived credentials | ✅ | ✅ | ✅ | Core Vault capability | + +--- + +## 7. Deployment Architecture + +``` +AWS GovCloud (us-gov-west-1) +└── VPC: csvd-dev-gov + + ┌─────────────────────────────────────┐ + │ ECS Fargate / EC2 (TBD: CSC-1346) │ + │ │ + │ Vault Server (HA cluster) │ + │ ├── Backend: S3 (encrypted) │ + │ ├── HA: DynamoDB lock │ + │ ├── Unseal: AWS KMS auto-unseal │ + │ ├── Audit: CloudWatch Logs │ + │ └── TLS: ACM Private CA │ + └──────────────────┬──────────────────┘ + │ + ┌──────────▼──────────┐ + │ Vault Backends │ + ├─────────────────────┤ + │ aws/ ← dynamic │ + │ roles/adsd-dev │ → IAM credentials for 015325649777 + │ roles/csvd-dev │ → IAM credentials for 229685449397 + │ roles/ditd-prod │ → IAM credentials for ... + │ ... │ + ├─────────────────────┤ + │ auth/aws/ │ + │ roles/sc-executor│ → trusts tf-run-executor-codebuild + └─────────────────────┘ + │ + ┌──────────▼──────────────────────────────┐ + │ CodeBuild: tf-run-executor │ + │ (229685449397, us-gov-west-1) │ + │ │ + │ 1. vault login (IAM auth) │ + │ 2. vault read aws/creds/${ROLE} │ + │ 3. tf-run apply with dynamic creds │ + └─────────────────────────────────────────┘ +``` + +**Cluster topology is an open question (CSC-1346).** Options: +- **Single-node ECS Fargate** — simplest, lowest cost, acceptable for dev/non-prod +- **3-node ECS Fargate HA** — recommended for production +- **EC2 Auto Scaling Group** — most resilient, more ops overhead +- **HCP Vault Dedicated (GovCloud)** — managed, FedRAMP Moderate, but cost and + network routing to HashiCorp infra needs evaluation + +--- + +## 8. What We Are NOT Proposing + +To keep scope realistic: + +- ❌ Replacing AWS Secrets Manager for application secrets +- ❌ PKI / certificate management (yet) +- ❌ Database credentials (yet) +- ❌ Transit encryption as a service (yet) +- ❌ Org-wide Vault rollout — CSVD first, expand after buy-in + +The initial scope is **one thing:** dynamic AWS credentials for the `sc-lambda-ghactions` +executor. Everything else is future potential. + +--- + +## 9. Risks and Mitigations + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| Vault cluster goes down, blocking deployments | Medium | High | HA cluster + runbook; circuit-breaker in buildspec | +| ISSO does not authorize Vault for ATO boundary | Medium | High | Engage ISSO early; start with dev account only | +| BSL license rejected by legal | Low | Medium | Switch to OpenBao (same API, MPL 2.0) | +| Team unfamiliar with Vault ops | Medium | Medium | Start small; document runbooks; CSC-1346 topology decision | +| KMS auto-unseal key deletion | Very Low | Critical | KMS key deletion protection enabled; backup unseal keys in SSM | + +--- + +## 10. Recommended Path Forward + +### Phase 1 — Proof of Concept (2 weeks) +- [ ] **CSC-1346** — Decide cluster topology (recommendation: single-node Fargate for PoC) +- [ ] Deploy Vault OSS in `csvd-dev` dev environment +- [ ] Configure AWS IAM auth + one AWS backend role for `csvd-dev` account +- [ ] Wire `buildspec-executor.yml` to use `vault read` instead of `sts:AssumeRole` +- [ ] Demo to CSVD stakeholders + +### Phase 2 — ISSO Engagement + ATO Review (parallel) +- [ ] Work with Census ISSO to add Vault as a component in the ATO boundary +- [ ] Assess FIPS 140-2 requirement — Vault Enterprise vs OpenBao FIPS build +- [ ] Legal review of BSL 1.1 for internal government use + +### Phase 3 — Production Hardening (post-ATO) +- [ ] HA cluster (3-node Fargate) +- [ ] KMS auto-unseal in production +- [ ] CloudWatch audit log forwarding +- [ ] Vault policies for all target accounts +- [ ] **CSC-1344** unblocked → E2E test (**CSC-1343**) + +### Phase 4 — Expansion (post buy-in from Manny) +- [ ] Onboard other teams (adsd, ditd, ent) — one Vault role per account +- [ ] Standardize in account bootstrapping runbook + +--- + +## 11. Call to Action + +| Who | Ask | +|---|---| +| **Manny** | Executive buy-in to invest in Vault as org-wide credential platform | +| **CSVD team** (`badra001`, `dwara001`, `pubba001`, `kalep001`, `alade001`) | Review this proposal; join CSC-1345/CSC-1346 discussion | +| **Census ISSO** | Early engagement on ATO boundary inclusion | +| **Census Legal** | BSL 1.1 license review (or OpenBao as fallback) | +| **`arnol377`** | CSC-1346 topology decision → PoC deployment | + +--- + +## Appendix A — Quick Reference + +```bash +# How CodeBuild authenticates (IAM auth) +vault write auth/aws/login \ + role="sc-automation-executor" \ + iam_http_request_method=POST \ + iam_request_url=... \ + iam_request_body=... \ + iam_request_headers=... + +# How executor gets creds for a target account +vault read aws/creds/adsd-dev +# Returns: access_key, secret_key, security_token (TTL: 15m) + +# Admin: add a new account +vault write aws/roles/new-account \ + credential_type=assumed_role \ + role_arns="arn:aws-us-gov:iam::ACCOUNT_ID:role/r-inf-terraform" + +# Admin: revoke all active leases for an account +vault lease revoke -prefix aws/creds/adsd-dev +``` + +## Appendix B — Links + +- HashiCorp Vault docs: https://developer.hashicorp.com/vault/docs +- Vault AWS Secrets Engine: https://developer.hashicorp.com/vault/docs/secrets/aws +- Vault AWS Auth Method: https://developer.hashicorp.com/vault/docs/auth/aws +- OpenBao project: https://openbao.org +- BSL 1.1 full text: https://www.hashicorp.com/bsl +- Jira CSC-1345: https://jira.it.census.gov/browse/CSC-1345 +- Jira CSC-1346: https://jira.it.census.gov/browse/CSC-1346