diff --git a/aws/documentation/account-decommission/decommission.md b/aws/documentation/account-decommission/decommission.md new file mode 100644 index 00000000..b1e72f24 --- /dev/null +++ b/aws/documentation/account-decommission/decommission.md @@ -0,0 +1,1136 @@ +# AWS Account Decommissioning Process + +This document will use the account `ma24` (both EW `ma24-ew` and GovCloud `ma24-gov`) as an example. + +This assumes that all VPC-provisioned resources have been removed. + +# Checklist + +* Pre-check + +- [ ] Validate approval to remove account. Follow the process defeined [here](https://github.e.it.census.gov/terraform/support/tree/master/docs/how-to/account-decommissioning). +- [ ] Update [ACCOUNTS.md](https://github.e.it.census.gov/terraform/cloud-information/blob/master/aws/info/ACCOUNTS.md) to indicate the intention to decomission account. +- [ ] Destroy VPC provisioned resources +- [ ] Destroy non-VPC provisioned resources + +* Actitivities + +1. [Remove SSO Access](#step-1-remove-sso-access) +1. [Check and Remove VPCs and related](#step-2-check-and-remove-vpcs) +1. [Integration: Remove DataDog](#step-3-integration-remove-datadog) +1. [Integration: Remove Apptio](#step-4-integration-remove-apptio) +1. [Move account out of Organization OU to Decomission OU](#step-5-move-account-out-of-organization-ou-to-decomission-ou) +1. [Remove infrastructure/{region}](#step-6-remove-infrastructureregion) +1. [Remove Users](#step-7-remove-users) +1. [Remove common service accounts](#step-8-remove-common-service-accounts) +1. [Other common/ directories](#step-9-other-common-directories) +1. [Remaining things in common/ which will not be removed](#step-10-remaining-things-in-common-which-will-not-be-removed) +1. [Empty S3 Buckets](#step-11-s3-buckets) +1. [Final checks before requesting removal](#step-12-final-checks-before-requesting-removal) +1. [Record the accounts as decomissioned](#step-13-record-the-accounts-as-decomissioned) +1. [Request Decommission of the reseller](#step-14-request-decommission-of-the-reseller) + +# Pre-Check + +## Complete account decomission validation document + +## Record the accounts to be decomissioned in ACCOUNTS.md + +In the repository `cloud-information` and directory `/aws/info`, update the file `ACCOUNTS.md` and move the account details into the section labeled +`Decommissioned AWS Accounts`. Add the date of the decommission at the end, as shown in this example: + +```script +| Account Number | Account Name | Use | Tennant | Registered Email Address | Console URL | Date | +|---|---|---|---|---|---|---| +| 576208090170 | ma24-ew | Enterprise EW EDL Internal Compute | AWS East/West | csvd.aws+ma24-ew@census.gov | https://us-east-2.console.aws.amazon.com/console | 2024-09-20 | +| 198886018595 | ma24-gov| Enterprise GovCloud EDL Internal Compute | AWS GovCloud | | https://ma24-gov-edl.signin.amazonaws-us-gov.com/ | 2024-09-20 | +``` + +Also add a comment to the end of the Changelog: + +```script +* 2024-09-20 + * move ma24-{ew,gov} to decommissioned +``` + +## Destroy VPC created resoures + +## Destroy non-VPC created resoures + +# Step 1: Remove SSO Access + +Check that the account has no user-based sso configuration. In each managemement account for the respective organization, ent-ew(109223337795-censusaws), and one of ent-gov(252903981224-ma5-gov) or lab-gov(243219719746-lab-gov-management-nonprod), +look for instances of the account alias and the account id. + +```script +# cd ORG-MASTER-ACCOUNT-REPO +cd infrastructure/global/sso +git grep ALIAS +git grep ACCOUNTID +``` + +If you get any results, you'll need to remove these and apply. Example: + +```console +% cd $HOME/terraform/252903981224-ma5-gov/infrastructure/global/sso +% git grep ma24 +users_groups.yml: - ma24-gov +users_groups.yml: - ma24-gov +% git grep 198886018595 +users_groups.yml: - "198886018595" +users_groups.yml: - "198886018595" +``` + +We will go update any found locations to remove this account, and then apply (following the normal gitflow process). Example + +```console +# edit file(s) +# this is in infrastructure/global/sso. Most of these will be in permissionsets/* +% tf-plan +. +. +% tf-plan summary +* tf-plan summary from log logs/plan.20240903.1725367233.log +> to-be created (0) + +> to-be updated (0) + +> to-be replaced (0) + +> to-be destroyed (10) + # aws_ssoadmin_account_assignment.admin_edl-admin["198886018595"] will be destroyed + # aws_ssoadmin_account_assignment.security_edl-security-audit["198886018595"] will be destroyed + +> has changed (0) + +> has moved (0) + +Plan: 0 to add, 0 to change, 2 to destroy. +``` + +# Step 2: Check and Remove VPCs + +We assume all resources have been destroyed before we get to this point. That includes EC2 instances, RDS databases, +and anything that uses a Network Interface in the VPC. You will not be able to destroy the VPC and associated configurations +if this has not been done. + +There are two possible VPC configurations, one for shared VPCs, and one for account dedicated VPCs. + +First, we will check if any VPCs exist for the account. In either VPC configuration, if VPCs are present +in the account, you will need to remove them. + +```script +# cd ACCOUNT-REPO +cd vpc/east +tf-aws ec2 describe-vpcs +cd ../west +tf-aws ec2 describe-vpcs +``` + +If we find any VPCs, we will have some cleanup to do. Our example `ma24-gov` has no VPCs: + +```console +% tf-aws ec2 describe-vpcs +{ + "Vpcs": [] +} +``` + +We will update this document with details of actual VPC values when we decommission one which has been setup. + +## Shared VPCs + +For shared VPCs, we will need to destroy all the resources first. Then, for each VPC, destroy the resources in the `vpc/{region}/vpc{n}` +directory. All subdirectories here (apps/*) need to have been destroyed. + +```script +# cd ACCOUNT-REPOT +cd vpc/{region}/vpc{n} +tf-destroy +``` + +In the case of shared VPCs, they tend to be allocated based on OU membership. Please check in the network account and see +if a VPC is explicitly shared to an account or OU. + +# Step 3: Integration: Remove Datadog + +Go to the management account for the organizatioon, into the DataDog stackset account-deployment. This should be done +before moving the account to the Decomission OU, as it will remove a service account. + +```script +# cd MANAGEMENT-ACCOUNT +cd infrastructure/global/stacksets/app-csvd-datadog/account-deployment +``` + +Check for the existence of the account ID and alias of the account you are removing. For example: + +```console +% ls -ald *ma24*gov +drwx------. 3 badra001 tco 124 Jan 16 2024 198886018595-ma24-gov +``` + +We will go into this directory and destroy the configuration. + +```script +cd 198886018595-ma24-gov +tg.sh destroy +``` + +If we have a number of them to do, we can use the run-all approach. For example, to remove ma20 through 24: + +```script +# cd MANAGEMENT-ACCOUNT +cd infrastructure/global/stacksets/app-csvd-datadog/account-deployment +tg.sh run-all destroy --terragrunt-include-dir "*ma2{0,1,2,3,4}-gov" +``` + +One this is removed, we rmeove the directories from git: + +```script +rm -rf ma24-gov +``` + +1. Remove named entry (e.g., `ma24-gov`) from `ent-gov.profiles.txt` (or _{org}_.profiles.txt) +1. Add entry into `ent-gov.decommissioned.txt` (or _{org}_.decomissioned.txt) +1. Commit and push + +# Step 4: Integration: Remove Apptio + +# Step 5: Move account out of Organization OU to Decomission OU + +Find the appropriate organziation account entry (in organizations.account.yml or accounts/{label}.yml). Change the `ou` +to `Decommission`. Example: + +```yaml +- "label": "edl-prod-3" + "description": "Enterprise GovCloud EDL Internal Compute" + "account_id": "198886018595" + "account_name": "ma24-ew" + "account_type": "gov" + "email": "csvd.aws+ma24-ew@census.gov" + "alias": "ma24-gov" +# "ou": "Enterprise-GOV:Available" + "ou": "Decomission" + "org_prefix": "ent" + "enabled": true + "create_gov": false +``` + +Then, apply the change. + +```console +% tf-run apply tag:accounts only +``` + +You may check the organizations to be sure it has moved it properly. This should remove any resource shares +(via RAM) that are OU based and not organziation based. + +## Dedicated VPCs + +# Step 6: Remove infrastructure/{region} + +This assumes all files from the various buckets have been handled, either moved someplace else, if necessary, +or deemed ready for removal. The process to move files will be documented separately later (when we run into that case). + +We will need to make sure we have the most current modules, so we'll do a `tf-init -ugprade`. Then, we will change +some stuff in the modules to allow the deletion. We'll apply to change the setting, and then destroy. + +Start off with the refresh of the module, and see if anything is supposed to change. + +```script +tf-init -upgrade +tf-plan +``` + +We are not going to apply the changes. We needed the current module to proceed. + +In each region directory (`east`, `west`), we need to destroy all the resources there. They are: + +* module.cloudtrail +* module.cloudtrail_key +* module.config +* module.flowlogs +* module.logs +* module.object-logging +* module.splunk_description + +```console +% tf-destroy +Error: Instance cannot be destroyed + + on .terraform/modules/config/config/s3.tf line 4: + 4: resource "aws_s3_bucket" "config" { + +Resource module.config.aws_s3_bucket.config[0] has lifecycle.prevent_destroy +set, but the plan calls for this resource to be destroyed. To avoid this +error and continue with the plan, either disable lifecycle.prevent_destroy or +reduce the scope of the plan using the -target flag. + +Error: Instance cannot be destroyed + + on .terraform/modules/flowlogs/s3-flow-logs/main.tf line 61: + 61: resource "aws_s3_bucket" "flowlogs" { + +Resource module.flowlogs.aws_s3_bucket.flowlogs has lifecycle.prevent_destroy +set, but the plan calls for this resource to be destroyed. To avoid this +error and continue with the plan, either disable lifecycle.prevent_destroy or +reduce the scope of the plan using the -target flag. + +Error: Instance cannot be destroyed + + on .terraform/modules/logs/s3-access-logs/main.tf line 66: + 66: resource "aws_s3_bucket" "logs" { + +Resource module.logs.aws_s3_bucket.logs has lifecycle.prevent_destroy set, +but the plan calls for this resource to be destroyed. To avoid this error and +continue with the plan, either disable lifecycle.prevent_destroy or reduce +the scope of the plan using the -target flag. +``` + +To fix, we have to add `force_destroy = true` and comment the lifecycle statement `prevent_destroy` + +Edit these files: + +* .terraform/modules/config/config/s3.tf +* .terraform/modules/flowlogs/s3-flow-logs/main.tf +* .terraform/modules/logs/s3-access-logs/main.tf + +Here is an example of the changes to make. + +```hcl +resource "aws_s3_bucket" "config" { + count = var.create_s3_bucket ? 1 : 0 + bucket = local.bucket_name + # acl = "private" + force_destroy = true + + lifecycle { +# prevent_destroy = true + ignore_changes = [tags["boc:tf_module_version"]] + } +. +. +``` + +There are two other buckets here, cloudtrail and object-logging. We will do a similar thing there, but +we need to only change `force_destroy` (it does not have the lifecycle setting). + +These are the two files. + +* .terraform/modules/cloudtrail/cloudtrail/s3.tf +* .terraform/modules/object-logging/main.tf + +Change `force_destroy` from false to true. Example: + +```hcl +resource "aws_s3_bucket" "this" { + bucket = local.bucket_name + # acl = "private" + force_destroy = true + + tags = merge( + local.base_tags, + var.tags, + { "Name" = local.name }, + ) +} +``` + +Now, once these changes have been made, let's plan and apply just the S3 changes: + +```script +tf-plan ; tf-plan summary +tf-apply $(tf-plan summary|grep -E 'aws_s3_bucket\.'|awk '{print "-target=" $2}') +``` + +You may see some additional targets needed. This is due to the module having been changed since it was +last run here. If you apply after the init earlier, you will likely not see this. + +Example: + +```script +% tf-apply $(tf-plan summary|grep -E 'aws_s3_bucket\.'|awk '{print "-target=" $2}') +. +. +Warning: Resource targeting is in effect + +You are creating a plan with the -target option, which means that the result +of this plan may not represent all of the changes requested by the current +configuration. + +The -target option is not for routine use, and is provided only for +exceptional situations such as recovering from errors or mistakes, or when +Terraform specifically suggests to use it as part of an error message. + +Error: Moved resource instances excluded by targeting + +Resource instances in your current state have moved to new addresses in the +latest configuration. Terraform must include those resource instances while +planning in order to ensure a correct result, but your -target=... options do +not fully cover all of those resource instances. + +To create a valid plan, either remove your -target=... options altogether or +add the following additional target options: + -target="module.config.aws_s3_bucket.config" + -target="module.config.aws_s3_bucket_ownership_controls.config" + -target="module.config.aws_s3_bucket_public_access_block.config" + -target="module.config.aws_s3_bucket_server_side_encryption_configuration.config" + -target="module.config.aws_s3_bucket_versioning.config" + +Note that adding these options may include further additional resource +instances in your plan, in order to respect object dependencies. +``` + +To solve this, either apply the changes from the `tf-init -upgrade`, or add these target statements. I'm going to opt for the +full apply. Given we are going to destroy right afterwards, the only important changes are the updates to the s3 buckets. + +```console +% tf-plan summary +* tf-plan summary from log logs/plan.20240906.1725630328.log +> to-be created (1) + # module.object-logging.aws_s3_bucket_lifecycle_configuration.this will be created + +> to-be updated (24) + # module.cloudtrail.aws_cloudtrail.this will be updated in-place + # module.cloudtrail.aws_cloudwatch_log_group.this will be updated in-place + # module.cloudtrail.aws_iam_role.cloudtrail will be updated in-place + # module.cloudtrail.aws_s3_bucket.this will be updated in-place + # module.cloudtrail.aws_s3_bucket_policy.policy will be updated in-place + # module.cloudtrail.aws_sns_topic.cloudtrail[0] will be updated in-place + # module.cloudtrail.aws_sns_topic_policy.cloudtrail[0] will be updated in-place + # module.cloudtrail.aws_sqs_queue.cloudtrail[0] will be updated in-place + # module.cloudtrail.aws_sqs_queue.cloudtrail_deadletter[0] will be updated in-place + # module.cloudtrail.aws_sqs_queue_policy.cloudtrail_deadletter[0] will be updated in-place + # module.cloudtrail.aws_sqs_queue_policy.cloudtrail_sqs[0] will be updated in-place + # module.cloudtrail_key.aws_kms_key.key will be updated in-place + # module.config.aws_s3_bucket.config[0] will be updated in-place + # module.flowlogs.aws_s3_bucket.flowlogs will be updated in-place + # module.flowlogs.aws_s3_bucket_policy.flowlogs will be updated in-place + # module.flowlogs.aws_s3_bucket_server_side_encryption_configuration.flowlogs will be updated in-place + # module.logs.aws_s3_bucket.logs will be updated in-place + # module.logs.aws_s3_bucket_policy.logs will be updated in-place + # module.logs.aws_s3_bucket_server_side_encryption_configuration.logs will be updated in-place + # module.object-logging.aws_cloudtrail.this will be updated in-place + # module.object-logging.aws_kms_key.key will be updated in-place + # module.object-logging.aws_s3_bucket.this will be updated in-place + # module.object-logging.aws_s3_bucket_policy.policy will be updated in-place + # module.object-logging.aws_s3_bucket_server_side_encryption_configuration.this will be updated in-place + +> to-be replaced (2) + # module.cloudtrail.local_file.splunk_cloudtrail[0] must be replaced + # module.config.aws_iam_role_policy_attachment.config["aws-config-role"] must be replaced + +> to-be destroyed (0) + +> has changed (0) + +> has moved (4) + # module.config.aws_s3_bucket_ownership_controls.config has moved to module.config.aws_s3_bucket_ownership_controls.config[0] + # module.config.aws_s3_bucket_public_access_block.config has moved to module.config.aws_s3_bucket_public_access_block.config[0] + # module.config.aws_s3_bucket_server_side_encryption_configuration.config has moved to module.config.aws_s3_bucket_server_side_encryption_configuration.config[0] + # module.config.aws_s3_bucket_versioning.config has moved to module.config.aws_s3_bucket_versioning.config[0] + +Plan: 3 to add, 24 to change, 2 to destroy. + +% tf-apply +. +. +Apply complete! Resources: 3 added, 19 changed, 2 destroyed. +``` + +Now, for the destroy: + +```console +# save state +% manage-remote-state.sh get +* using settings datestamp=20240906.1725631039 + bucket = inf-tfstate-198886018595 + bucket_key = ma24-gov/infrastructure/east/terraform.tfstate + bucket_region = us-gov-east-1 + dynamodb_table = tf_remote_state + dynamodb_item = inf-tfstate-198886018595/ma24-gov/infrastructure/east/terraform.tfstate-md5 + +* making directory logs/20240906.1725631039 +* getting bucket s3://inf-tfstate-198886018595/ma24-gov/infrastructure/east/terraform.tfstate to logs/20240906.1725631039/terraform.tfstate +download: s3://inf-tfstate-198886018595/ma24-gov/infrastructure/east/terraform.tfstate to logs/20240906.1725631039/terraform.tfstate +* getting ddb table entry inf-tfstate-198886018595/ma24-gov/infrastructure/east/terraform.tfstate-md5 to logs/20240906.1725631039/lock-entry.json +``` + +Here is a `tf-destroy summary`: + +```console +% tf-destroy summary +* tf-destroy summary from log logs/destroy.20240906.1725631071.log +> to-be created (0) + +> to-be updated (0) + +> to-be replaced (0) + +> to-be destroyed (83) + # module.cloudtrail.aws_cloudtrail.this will be destroyed + # module.cloudtrail.aws_cloudwatch_log_group.this will be destroyed + # module.cloudtrail.aws_iam_policy.cloudtrail_policy will be destroyed + # module.cloudtrail.aws_iam_role.cloudtrail will be destroyed + # module.cloudtrail.aws_s3_bucket.this will be destroyed + # module.cloudtrail.aws_s3_bucket_logging.this will be destroyed + # module.cloudtrail.aws_s3_bucket_ownership_controls.this will be destroyed + # module.cloudtrail.aws_s3_bucket_policy.policy will be destroyed + # module.cloudtrail.aws_s3_bucket_public_access_block.this will be destroyed + # module.cloudtrail.aws_s3_bucket_server_side_encryption_configuration.this will be destroyed + # module.cloudtrail.aws_sns_topic.cloudtrail[0] will be destroyed + # module.cloudtrail.aws_sns_topic_policy.cloudtrail[0] will be destroyed + # module.cloudtrail.aws_sns_topic_subscription.cloudtrail_sqs[0] will be destroyed + # module.cloudtrail.aws_sqs_queue.cloudtrail[0] will be destroyed + # module.cloudtrail.aws_sqs_queue.cloudtrail_deadletter[0] will be destroyed + # module.cloudtrail.aws_sqs_queue_policy.cloudtrail_deadletter[0] will be destroyed + # module.cloudtrail.aws_sqs_queue_policy.cloudtrail_sqs[0] will be destroyed + # module.cloudtrail.local_file.splunk_cloudtrail[0] will be destroyed + # module.cloudtrail.null_resource.policy_delay will be destroyed + # module.cloudtrail.null_resource.splunk_cloudtrail[0] will be destroyed + # module.cloudtrail_key.aws_kms_alias.key will be destroyed + # module.cloudtrail_key.aws_kms_key.key will be destroyed + # module.config.aws_config_config_rule.config_rules["ENCRYPTED_VOLUMES"] will be destroyed + # module.config.aws_config_config_rule.config_rules["MFA_ENABLED_FOR_IAM_CONSOLE_ACCESS"] will be destroyed + # module.config.aws_config_config_rule.config_rules["RDS_STORAGE_ENCRYPTED"] will be destroyed + # module.config.aws_config_config_rule.config_rules["S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED"] will be destroyed + # module.config.aws_config_config_rule.config_rules["VPC_FLOW_LOGS_ENABLED"] will be destroyed + # module.config.aws_config_configuration_recorder.config will be destroyed + # module.config.aws_config_configuration_recorder_status.config will be destroyed + # module.config.aws_config_delivery_channel.config will be destroyed + # module.config.aws_iam_policy.config will be destroyed + # module.config.aws_iam_role.config will be destroyed + # module.config.aws_iam_role_policy_attachment.config["aws-config-role"] will be destroyed + # module.config.aws_iam_role_policy_attachment.config["aws-configrules-role"] will be destroyed + # module.config.aws_iam_role_policy_attachment.config["p-inf-config"] will be destroyed + # module.config.aws_s3_bucket.config[0] will be destroyed + # module.config.aws_s3_bucket_ownership_controls.config[0] will be destroyed + # module.config.aws_s3_bucket_public_access_block.config[0] will be destroyed + # module.config.aws_s3_bucket_server_side_encryption_configuration.config[0] will be destroyed + # module.config.aws_s3_bucket_versioning.config[0] will be destroyed + # module.config.aws_sns_topic.config will be destroyed + # module.config.aws_sns_topic_policy.config will be destroyed + # module.config.aws_sns_topic_subscription.config will be destroyed + # module.config.aws_sqs_queue.config will be destroyed + # module.config.aws_sqs_queue.config_deadletter will be destroyed + # module.config.aws_sqs_queue_policy.config will be destroyed + # module.config.aws_sqs_queue_policy.config_deadletter will be destroyed + # module.config.local_file.splunk_config will be destroyed + # module.config.local_file.splunk_configrules will be destroyed + # module.config.null_resource.splunk_config will be destroyed + # module.config.null_resource.splunk_configrules will be destroyed + # module.flowlogs.aws_s3_bucket.flowlogs will be destroyed + # module.flowlogs.aws_s3_bucket_acl.flowlogs will be destroyed + # module.flowlogs.aws_s3_bucket_ownership_controls.flowlogs will be destroyed + # module.flowlogs.aws_s3_bucket_policy.flowlogs will be destroyed + # module.flowlogs.aws_s3_bucket_public_access_block.flowlogs will be destroyed + # module.flowlogs.aws_s3_bucket_server_side_encryption_configuration.flowlogs will be destroyed + # module.flowlogs.aws_s3_bucket_versioning.flowlogs will be destroyed + # module.flowlogs.null_resource.policy_delay will be destroyed + # module.logs.aws_s3_bucket.logs will be destroyed + # module.logs.aws_s3_bucket_acl.logs will be destroyed + # module.logs.aws_s3_bucket_ownership_controls.this will be destroyed + # module.logs.aws_s3_bucket_policy.logs will be destroyed + # module.logs.aws_s3_bucket_public_access_block.logs will be destroyed + # module.logs.aws_s3_bucket_server_side_encryption_configuration.logs will be destroyed + # module.logs.aws_s3_bucket_versioning.logs will be destroyed + # module.logs.aws_s3_object.logs["alb-logs"] will be destroyed + # module.logs.aws_s3_object.logs["elasticmapreduce"] will be destroyed + # module.logs.aws_s3_object.logs["inventory"] will be destroyed + # module.logs.aws_s3_object.logs["nlb-logs"] will be destroyed + # module.logs.aws_s3_object.logs["s3"] will be destroyed + # module.logs.null_resource.policy_delay will be destroyed + # module.object-logging.aws_cloudtrail.this will be destroyed + # module.object-logging.aws_kms_alias.key will be destroyed + # module.object-logging.aws_kms_key.key will be destroyed + # module.object-logging.aws_s3_bucket.this will be destroyed + # module.object-logging.aws_s3_bucket_logging.this will be destroyed + # module.object-logging.aws_s3_bucket_policy.policy will be destroyed + # module.object-logging.aws_s3_bucket_public_access_block.this will be destroyed + # module.object-logging.aws_s3_bucket_server_side_encryption_configuration.this will be destroyed + # module.object-logging.null_resource.policy_delay will be destroyed + # module.splunk_description.local_file.splunk_description will be destroyed + # module.splunk_description.null_resource.splunk_description will be destroyed + +> has changed (0) + +> has moved (0) + +Plan: 0 to add, 0 to change, 83 to destroy. +``` + +Now, we apply the destroy: + +**NOTE**: **Make absolutely certain you are in the right account and right directory!!** + +```console +% tf-destroy +. +. +Plan: 0 to add, 0 to change, 84 to destroy. + +Changes to Outputs: + - account_caller_arn = "arn:aws-us-gov:sts::198886018595:assumed-role/r-inf-terraform/badra001" -> null + - account_caller_arn_partition = "aws-us-gov" -> null + - caller_account_id = "198886018595" -> null + - config_sns_topic_arn = "arn:aws-us-gov:sns:us-gov-east-1:198886018595:inf-config-us-gov-east-1" -> null + - config_sqs_arn = "arn:aws-us-gov:sqs:us-gov-east-1:198886018595:inf-config-us-gov-east-1" -> null + - config_sqs_id = "https://sqs.us-gov-east-1.amazonaws.com/198886018595/inf-config-us-gov-east-1" -> null + - flowlogs_arn = "arn:aws-us-gov:s3:::inf-flowlogs-198886018595-us-gov-east-1" -> null + - flowlogs_id = "inf-flowlogs-198886018595-us-gov-east-1" -> null + - logs_arn = "arn:aws-us-gov:s3:::inf-logs-198886018595-us-gov-east-1" -> null + - logs_id = "inf-logs-198886018595-us-gov-east-1" -> null + - profile = "198886018595-ma24-gov" -> null + - region = "us-gov-east-1" -> null + - vpc_full_name = "" -> null + +Do you really want to destroy all resources? + Terraform will destroy all your managed infrastructure, as shown above. + There is no undo. Only 'yes' will be accepted to confirm. + + Enter a value: yes +. +. +Error: emptying S3 Bucket (inf-logs-198886018595-us-gov-east-1): deleting S3 bucket (inf-logs-198886018595-us-gov-east-1) object versions: deleting: S3 object (s3/inf-cloudtrail-198886018595-us-gov-east-1/2023-03-11-18-17-50-85AFEE07460AA1E8) version (null): InternalError: We encountered an internal error. Please try again. + +# ending v1.11.0 action destroy file logs/destroy.20240906.1725631422.log stamp 20240906.1725631422 start 1725631422 end 1725631946 elapsed 524 +# results in file logs/destroy.20240906.1725631422.log stamp 20240906.1725631422 status=0 +``` + +This may be due to a timeout on the destroy operation. Try the destroy again. If it fails a second time, put in a github issue. In this case, we tried again, and it worked. + +```console +% tf-destroy +.. +Plan: 0 to add, 0 to change, 1 to destroy. +.. +Do you really want to destroy all resources? + Terraform will destroy all your managed infrastructure, as shown above. + There is no undo. Only 'yes' will be accepted to confirm. + + Enter a value: yes + +module.logs.aws_s3_bucket.logs: Destroying... [id=inf-logs-198886018595-us-gov-east-1] +module.logs.aws_s3_bucket.logs: Still destroying... [id=inf-logs-198886018595-us-gov-east-1, 10s elapsed] +module.logs.aws_s3_bucket.logs: Still destroying... [id=inf-logs-198886018595-us-gov-east-1, 20s elapsed] +. +. +module.logs.aws_s3_bucket.logs: Still destroying... [id=inf-logs-198886018595-us-gov-east-1, 9m0s elapsed] +module.logs.aws_s3_bucket.logs: Still destroying... [id=inf-logs-198886018595-us-gov-east-1, 9m10s elapsed] +module.logs.aws_s3_bucket.logs: Destruction complete after 9m15s + +Destroy complete! Resources: 1 destroyed. +# ending v1.11.0 action destroy file logs/destroy.20240906.1725633371.log stamp 20240906.1725633371 start 1725633371 end 1725634241 elapsed 870 +``` + +Validate everything is destroyed with `tf-state list`. If so, we'll cleanup and move on (should be working in a branch called `decomission`). + +```script +tf-state list +manage-remote-state.sh delete +tf-run clean +rm -rf .terraform* +git commit -m'decomission infrastructure/{region}' . +git push +``` +# Step 7: Remove Users + +This will remove some of of the users with `u-`, `a-`, and `s-` prefixes (user, admin, and service accounts). These will be primarily +in subdirectories, not directly in `common/` + +Get a list of users in `common/admin-users` directory: + +```script +cd common/admin-users +tf-init -upgrade +tf-destroy +# answer **no** +tf-destroy summary +``` + +Here is a sample `tf-destroy summary`: + +```console +% tf-destroy summary +* tf-destroy summary from log logs/destroy.20240916.1726494426.log +> to-be created (0) + +> to-be updated (0) + +> to-be replaced (0) + +> to-be destroyed (20) + # module.admin_ali00332.aws_iam_user.user will be destroyed + # module.admin_ali00332.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_aravi001.aws_iam_user.user will be destroyed + # module.admin_aravi001.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_dang0317.aws_iam_user.user will be destroyed + # module.admin_dang0317.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_garla019.aws_iam_user.user will be destroyed + # module.admin_garla019.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_jain0009.aws_iam_user.user will be destroyed + # module.admin_jain0009.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_muker001.aws_iam_user.user will be destroyed + # module.admin_muker001.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_nform001.aws_iam_user.user will be destroyed + # module.admin_nform001.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_pilla010.aws_iam_user.user will be destroyed + # module.admin_pilla010.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_runes001.aws_iam_user.user will be destroyed + # module.admin_runes001.aws_iam_user_group_membership.user_groups[0] will be destroyed + # module.admin_zulfi001.aws_iam_user.user will be destroyed + # module.admin_zulfi001.aws_iam_user_group_membership.user_groups[0] will be destroyed + +> has changed (0) + +> has moved (0) + +Plan: 0 to add, 0 to change, 20 to destroy. +``` + +Be sure there is nothing unexpected here. + +Next, we execute the destroy: + +**NOTE**: **Make absolutely certain you are in the right account and right directory!!** + +```console +% tf-destroy +. +. +Plan: 0 to add, 0 to change, 20 to destroy. + +Changes to Outputs: + - account_caller_arn = "arn:aws-us-gov:sts::198886018595:assumed-role/r-inf-terraform/dwara001" -> null + - account_caller_arn_partition = "aws-us-gov" -> null + - admin_ali00332 = { + - aws_access_key_id = "" + - aws_secret_access_key = "" + - user_arn = "arn:aws-us-gov:iam::198886018595:user/a-ali00332" + - user_name = "a-ali00332" + - user_password = "" + } -> null +. +. + +Do you really want to destroy all resources? + Terraform will destroy all your managed infrastructure, as shown above. + There is no undo. Only 'yes' will be accepted to confirm. + + Enter a value: yes + +module.admin_runes001.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20230418204552262800000001] +module.admin_nform001.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20221114133159105400000004] +module.admin_dang0317.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20230726173648713500000001] +module.admin_pilla010.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20221114133159090700000001] +module.admin_garla019.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20221114133159104000000003] +module.admin_aravi001.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20221114133159119000000005] +module.admin_ali00332.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20230428162006293500000001] +module.admin_jain0009.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20230418204658180500000001] +module.admin_muker001.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20230418204636733000000001] +module.admin_zulfi001.aws_iam_user_group_membership.user_groups[0]: Destroying... [id=terraform-20221114133159095900000002] +module.admin_zulfi001.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_nform001.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_nform001.aws_iam_user.user: Destroying... [id=a-nform001] +module.admin_zulfi001.aws_iam_user.user: Destroying... [id=a-zulfi001] +module.admin_jain0009.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_jain0009.aws_iam_user.user: Destroying... [id=a-jain0009] +module.admin_pilla010.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_pilla010.aws_iam_user.user: Destroying... [id=a-pilla010] +module.admin_aravi001.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_aravi001.aws_iam_user.user: Destroying... [id=a-aravi001] +module.admin_muker001.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_muker001.aws_iam_user.user: Destroying... [id=a-muker001] +module.admin_garla019.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_garla019.aws_iam_user.user: Destroying... [id=a-garla019] +module.admin_dang0317.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_ali00332.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_dang0317.aws_iam_user.user: Destroying... [id=a-dang0317] +module.admin_ali00332.aws_iam_user.user: Destroying... [id=a-ali00332] +module.admin_runes001.aws_iam_user_group_membership.user_groups[0]: Destruction complete after 0s +module.admin_runes001.aws_iam_user.user: Destroying... [id=a-runes001] +module.admin_garla019.aws_iam_user.user: Destruction complete after 1s +module.admin_ali00332.aws_iam_user.user: Destruction complete after 1s +module.admin_jain0009.aws_iam_user.user: Destruction complete after 1s +module.admin_nform001.aws_iam_user.user: Destruction complete after 1s +module.admin_zulfi001.aws_iam_user.user: Destruction complete after 1s +module.admin_muker001.aws_iam_user.user: Destruction complete after 1s +module.admin_aravi001.aws_iam_user.user: Destruction complete after 2s +module.admin_dang0317.aws_iam_user.user: Destruction complete after 2s +module.admin_runes001.aws_iam_user.user: Destruction complete after 2s +module.admin_pilla010.aws_iam_user.user: Destruction complete after 2s + +Destroy complete! Resources: 20 destroyed. +# ending v1.11.0 action destroy file logs/destroy.20240916.1726494746.log stamp 20240916.1726494746 start 1726494746 end 1726494783 elapsed 37 +``` + +Validate everything is destroyed with `tf-state list`. If so, we'll cleanup and move on. You should be working in a branch called `decomission`. + +```script +tf-state list +manage-remote-state.sh delete +tf-run clean +rm -rf .terraform* +git commit -m'decomission admin-users' -a . +git push +``` + +# Step 8: Remove common service accounts + +```script +tf-init -upgrade +tf-plan +``` + +We are not going to apply the changes. + +Now, for the destroy: +Have to destroy by resource name + +**NOTE**: **Make absolutely certain you are in the right account and right directory!!** + +example + +```console +% tf-destroy -target=aws_iam_policy.cloudforms_ami + +Here is a `tf-destroy summary`: + +% tf-destroy summary + +* tf-destroy summary from log logs/destroy.20240916.1726500722.log +> to-be created (0) + +> to-be updated (0) + +> to-be replaced (0) + +> to-be destroyed (3) + # aws_iam_policy.cloudforms_ami will be destroyed + # module.service_cloudforms.aws_iam_user_policy_attachment.user_policy["p-inf-cloudforms-main"] will be destroyed + # module.service_cloudforms.aws_iam_user_policy_attachment.user_policy["p-inf-cloudforms-shared-ami"] will be destroyed + +> has changed (0) + +> has moved (0) + +Plan: 0 to add, 0 to change, 3 to destroy. +``` +Now, we apply the destroy: + +**NOTE**: **Make absolutely certain you are in the right account and right directory!!** + + +```script +% tf-destroy -target=module.splunk_user +``` +Here is a `tf-destroy summary`: + +```console +% tf-destroy summary + +* tf-destroy summary from log logs/destroy.20240917.1726581446.log +> to-be created (0) + +> to-be updated (0) + +> to-be replaced (0) + +> to-be destroyed (3) + # module.splunk_user.aws_iam_policy.policy will be destroyed + # module.splunk_user.aws_iam_policy_attachment.user_policy will be destroyed + # module.splunk_user.module.user.aws_iam_user.user will be destroyed + +> has changed (0) + +> has moved (0) + +Plan: 0 to add, 0 to change, 3 to destroy. +``` +Now, we apply the destroy: +```console +% tf-destroy -target=module.splunk_user +. +. +Plan: 0 to add, 0 to change, 3 to destroy. + +Warning: Resource targeting is in effect + +You are creating a plan with the -target option, which means that the result +of this plan may not represent all of the changes requested by the current +configuration. + +The -target option is not for routine use, and is provided only for +exceptional situations such as recovering from errors or mistakes, or when +Terraform specifically suggests to use it as part of an error message. + +Do you really want to destroy all resources? + Terraform will destroy all your managed infrastructure, as shown above. + There is no undo. Only 'yes' will be accepted to confirm. + + Enter a value: yes +. +. +Destroy complete! Resources: 3 destroyed. +``` +Validate everything is destroyed with `tf-state list`. If so, we'll cleanup and move on (should be working in a branch called `decomission`). + +```script +tf-state list +manage-remote-state.sh delete +tf-run clean +rm -rf .terraform* +git commit -m'decomission service accounts' -a . +git push +``` + +# Step 9: Other common/ directories + +Clean up common/apps, common/east/.., common/west/.. + +# Step 10: Remaining things in common/ which will not be removed + +```console +% tf-state list | grep aws | grep -v data.aws +grep aws s|grep -v data.aws +aws_iam_policy.cloudforms_main +module.account_settings.aws_iam_account_alias.alias +module.account_settings.aws_iam_account_password_policy.account_settings +module.admin_ashle001.aws_iam_user.user +module.admin_ashle001.aws_iam_user_group_membership.user_groups[0] +module.admin_badra001.aws_iam_user.user +module.admin_badra001.aws_iam_user_group_membership.user_groups[0] +module.admin_dwara001.aws_iam_user.user +module.admin_dwara001.aws_iam_user_group_membership.user_groups[0] +module.admin_gogel001.aws_iam_user.user +module.admin_gogel001.aws_iam_user_group_membership.user_groups[0] +module.general.aws_iam_policy.general["deny_billing"] +module.general.aws_iam_policy.general["deny_readonly_data"] +module.general.aws_iam_policy.general["ip_restriction"] +module.general.aws_iam_policy.general["manage_credentials"] +module.general.aws_iam_policy.general["manage_keys"] +module.general.aws_iam_policy.general["network_admin"] +module.group_cloud-admin.aws_iam_group.this +module.group_cloud-admin.aws_iam_group_policy_attachment.this["arn:aws-us-gov:iam::aws:policy/AdministratorAccess"] +module.group_cloud-admin.aws_iam_group_policy_attachment.this["arn:aws-us-gov:iam::aws:policy/IAMUserChangePassword"] +module.group_ip-restriction.aws_iam_group.this +module.group_ip-restriction.aws_iam_group_policy_attachment.this["arn:aws-us-gov:iam::198886018595:policy/p-inf-ip-restriction"] +module.role_cloud-admin.aws_iam_role.role[0] +module.role_cloud-admin.aws_iam_role_policy_attachment.role["arn:aws-us-gov:iam::aws:policy/AdministratorAccess"] +module.role_cloud-admin.aws_iam_role_policy_attachment.role["arn:aws-us-gov:iam::aws:policy/IAMUserChangePassword"] +module.role_flowlogs.aws_iam_policy.flowlogs +module.role_flowlogs.aws_iam_role.role +module.role_flowlogs.aws_iam_role_policy_attachment.flowlogs +module.role_flowlogs.aws_iam_role_policy_attachment.role["arn:aws-us-gov:iam::198886018595:policy/p-inf-deny-billing"] +module.role_network-admin.aws_iam_role.role[0] +module.role_network-admin.aws_iam_role_policy_attachment.role["arn:aws-us-gov:iam::198886018595:policy/p-inf-network-admin"] +module.role_network-admin.aws_iam_role_policy_attachment.role["arn:aws-us-gov:iam::aws:policy/IAMUserChangePassword"] +module.role_network-admin.aws_iam_role_policy_attachment.role["arn:aws-us-gov:iam::aws:policy/job-function/NetworkAdministrator"] +module.role_readonly-operations.aws_iam_role.role[0] +module.role_readonly-operations.aws_iam_role_policy_attachment.role["arn:aws-us-gov:iam::198886018595:policy/p-inf-deny-readonly-data"] +module.role_readonly-operations.aws_iam_role_policy_attachment.role["arn:aws-us-gov:iam::aws:policy/ReadOnlyAccess"] +module.saml.aws_iam_saml_provider.saml +module.service_cloudforms.aws_iam_user.user +``` + +We are not going to remove everything. We'll get the resources like S3 buckets, networking, EC2 instances, RDS, etc. + +Once we have it down to a handful of things in common, we'll call it complete and hand it over to the reseller to remove. + +We will need to record that the accounts have been removed. + + + + +# Step 11: S3 Buckets + +Get a list of the S3 buckets in both regions. + +```script +cd common +tf-aws s3api list-buckets --query 'Buckets[].{a1:Name}' --output text +``` + +For example: + +```console +% cd common +% tf-aws s3api list-buckets --query 'Buckets[].{a1:Name}' --output text +cloudtrail-a7f4b5e8-d132-37a7-93a5-150f2d68437e +inf-cloudtrail-198886018595-us-gov-east-1 +inf-cloudtrail-198886018595-us-gov-west-1 +inf-config-198886018595-us-gov-east-1 +inf-config-198886018595-us-gov-west-1 +inf-flowlogs-198886018595-us-gov-east-1 +inf-flowlogs-198886018595-us-gov-west-1 +inf-logs-198886018595-us-gov-east-1 +inf-logs-198886018595-us-gov-west-1 +inf-objectlogging-198886018595-us-gov-east-1 +inf-objectlogging-198886018595-us-gov-west-1 +inf-tfstate-198886018595 +``` + +We will need to stop the services using these and then empty the buckets. + +**TBD** What to do with the data in the buckets? + +# Step 12: Final checks before requesting removal + +Look around at resources. There should be nothing which consumes compute or EBS. A handful of S3 buckets may still +exist, for infrastructure things. This is fine. + +We do not need to restore the account to a pristine state, as all of the resources will be deleted within 30 days of +the request to remove the account. + +# Step 14: Request Decommission of the reseller + +1. change `decommission` to `true` in ew YAML file. This removes the specific account from the map, and will perform the account +deletion. It will put the account into a `PENDING-DELETE` state. +```script +# 109223337795-censusaws/infrastructure/global/accounts/ditd-partnerportal-prod-ew.yml +decomission: true +``` +1. Excute `tf-plan`. This should indicate Terraform would delete the account. +```console +% tf-plan; tf-plan summary +* tf-plan summary from log logs/plan.20241126.1732653510.log +> to-be created (2) + # local_sensitive_file.org_hierarchy_json will be created + # local_sensitive_file.org_hierarchy_yaml will be created + +> to-be updated (0) + +> to-be replaced (0) + +> to-be destroyed (4) + # aws_organizations_account.accounts["ditd-partnerportal-prod-ew"] will be destroyed + # local_sensitive_file.accounts_ew_root_yml["ditd-partnerportal-prod-ew"] will be destroyed + # null_resource.cross_account_script_ew["ditd-partnerportal-prod-ew"] will be destroyed + # null_resource.cross_account_script_gov["ditd-partnerportal-prod-ew"] will be destroyed + +> has changed (0) + +> has moved (0) + +Plan: 2 to add, 0 to change, 4 to destroy. +``` +1. Add and new files generated to git, commit amd create a PR. +1. Once merged, apply in the `censusaws` account. +1. Repeat the YAML change for the govcloud account, in the respective govcloud management account (ma5-gov or lab-gov-management-nonprod). +Again, plan, commit, push and PR, and apply the changes once merged. This step will rmeove the account from the govcloud organization. +1. Next, we need to update the SSO permission sets. In each of the management accounts (censusaws, ma5-gov/lab-gov-management-nonprod), +perform a `tf-apply` in the `infrastructure/global/sso/permissionsets/ALL` directory. This will remove the account from the groups to which +it was assigned. +1. Further, look for any SSO group configurations which may have the decomissioned account. If found, remove the account from the {group}.yml file +and apply in those respective directories. You can find them with these commands: +``` +cd infrastructure/global/sso +git grep {account-ew-alias} + +# for example +git grep ditd-partnerportal-prod-ew +``` +Commit, push and PR if you had to remove the account from YAML files. + +In the respective organization management accounts, we will be moving the YAML files into a directory called +`infrastructure/global/organizations/decomissioned-accounts/`, but not until after the accounts have been officially +removed, as TF actions in the account directory of the management repo will try to delete them, and that's not something +that works (because one cannot delete an account without some alternate payer information). + +This is where we will notify the reseller of the accounts to be removed. + +# Notes + +* customer data should already be removed/backed up/otherwise handled prior to this process starting + +* once the account is removed to the "graveyard" OU + +* move account to ou indicating rmeoval + * this may remove some stackset-driven things, so maybe not + * cloud-nuke +* what do we need to preserve (say, some roles and iam users, like apptio and axonius) + * most other resources can go + * what stacksets need to stay, what needs to go + * datadog goes + * apptio stays + * axonius stays + * what per-account distributions (under stacksets) stays and what goes? + * datadog goes + * apptio stays + * what's the process with the reseller to close the gov and ew accounts? and just an ew account (we have a few) + * notification we are doing this? to whom? + +* downlaod all tfstate and ddb table (script) + * once state is destroyed, most TF is going to fail + * before the removal of state, officially decom from axonius and apptio (removal specific to the app) + +* tf-run superclean in every directory (except maybe init/git-setup) + * commit all git stuff after removal + * not sure what to do with the downlaoded tfstate and ddb table + * drop into some "archive account" bucket? + +* update tf code in git-setup to exgtract the repo-specific teams to text, commit to git, and then archive the repo, + * what needs to be removed from idc (any indication of account listed in yamls) + * what removes/changes in aws org TF + * perhaps a new field to the yaml for indicating it's removed/removing + * rmeoving from support, cloud-ifnromation, etc. + +(from document) +IEB,SCT/Cloud Infrastructure Cleanup - validate complete +1. Terminate users, terminate resources, (GovCloud) rotate credentials, review IAM policies for cross references and resolve +1. Morpheus Clean up: Removing Cat Item for the accounts +1. Ansible: Remove any Ansible playbooks related to account\applications +1. GitHub cleanup and repository + * Archive GitHub actions before clean-up +1. GitHub Account list cleanup and added notes + +(remaining doc steps) +1. run remove stuff for apptio +1. run scripts for collecting inventory +1. analyze for anythign which needs to be removed which wa smissed +1. the ew inventory also collects billing data +1. sync the tfstate bucket (I missed this) +1. commercial + 1. make the account delete enabled true + 1. apply + 1. mark account descomission true + 1. apply (this closes account!) +1. govcloud repeat 4.3, 4.4 + 1. remove from personal .aws/config + +# CHANGELOG + +* 1.0.0 -- 2024-06-25 + - initial + +* 1.0.1 -- 2024-09-03 + - add content for removing sso access + - add vpc stuff + - add change to OU + - add datadog removal + +* 1.0.2 -- 2024-09-05 + - add S3 stuff + +* 1.0.3 -- 2024-09-06 + - add infrastructure/{region} remove details + +* 1.0.4 -- 2024-09-19 + - add section in common/ + +* 1.0.5 -- 2024-09-20 + - reorganization, add more TOC stuff + - add document update steps + - add step numbers + +* 1.0.6 -- 2024-10-28 + - add more notes + +* 1.0.7 -- 2024-12-02 + - cleanup step 13, add step numbers to TOC + +* 1.0.8 -- 2025-09-11 + - add placeholder for remove apptio + +* 1.0.9 -- 2025-09-12 + - move info/ACCOUNTS.md higher in the process diff --git a/aws/documentation/overview.md b/aws/documentation/overview.md new file mode 100644 index 00000000..254dc460 --- /dev/null +++ b/aws/documentation/overview.md @@ -0,0 +1,172 @@ +# AWS Architecture Overview + +This will describe our AWS cloud environments, enterprise integration, and basic concepts and practices. It is not +an in-depth look but intended to provide enough informationot understand how we have AWS setup, our goals, and some +future direction. Links to other more details descriptions will be provided. + +* [Accounts](#accounts) +* [Networking](#networking) + * [Transit Gateway (TGW)](#transit-gateway) + * [VPCs](#vpcs) + * [Shared VPCs](#shared-vpcs) + * [Shared VPC endpoints](#shared-vpc-endpoints) +* [Compute](#compute) + * [Containers](#containers) + * [Instances](#instances) + * [Serverless](#serverless) +* [Management](#management) + * [AWS Organizations](#organizations) + * [AWS Identity Center (SSO)](#sso) + * [Security Tools](#security-tools) + * [Logging](#logging) +* [Shared Services](#shared-services) +* [AWS Services](#aws-services) + +# Introduction + +Our environments, which you could call enclaves, are based on our enterprise data segmentation approach and align closely with the SDL. Some definitions are here: https://github.it.census.gov/badra001/public-stuff/blob/master/environments.md. These are all available in the Enterprise ent-gov organization AWS _Internal_ accounts. + +1. common, services, shared + * this environment is reachable by all other environments + * services is where IT puts core infrastructure capabilities, such as backups, AD, LDAP, IDP, etc. + * shared is where we put enterprise capabilities, such as nexus, MFT, etc + * common is a per-customer setup where they put their common/central things in, such as a jenkins or gitlab server you use for all the environments +1. dev + * development, unit test, feature test, alpha and maybe beta type work, proof of concept, etc. + * reachable to common/services/shared and any other dev environment + * unreachable to test, stage, and prod +1. test + * this has several sub-environments + * ite -- integrated test environment + * uat -- user acceptance test + * qa --quality assurance + * all of these are unreachable to dev, stage or prod +1. stage + * this is used for full-scale performance testing, either with or without actual production data (synthetic data, fake data, substitute data set, etc.) + * it may also be usable for stress testing + * it is a non-production environment but should be equivalent to the prod environment in size and configuration + * unreachable by dev, test, or prod +1. prod + * obviously it's the production stuff + * unreachable by dev, test, or stage + +Keep in mind this segmentation. While s3 buckets are global, they are to be restricted by environment. Meaning, a dev bucket cannot be used by any other environment (except common/shared/services). + +For the AWS _DMZ_ accounts, still part of the ent-gov organization, the `dev` SDLC capabilities do not exist. + +In the Lab lab-gov organization, which is isolated from the Census production network, we have only these environments available: + +1. common, services, shared +1. dev +1. test + +There is no stage or prod environments, as the lab has some significant restrictions on what can be done. As listed above, the same environment connection segmentation exists. + +* The Lab, aka the IT Lab, the CAT Lab, or the VLAB, is isolated from the Census production networks. +* In order to use resources in the lab, one must be provisioned into the lab, through AWS Workspaces Windows systems (VDI capabilities). +* There is currently no direct outbound internet access in the Lab. One must use the HTTP proxy to get to the internet. There is no inbound internet access into AWS. +* No protected data are permitted in the Lab. This includes, but is not limited to, PII, BII, Title 26, Title 13, and other CUI sensitive data. +* No production operations are to be conducated +* For use as a proof of concept (POC) setup, once the POC is complete, all resources are to be destroyed and if it moves forward, it must do so in the production accounts. +* Full adminstrative access will not be granted. We still follow the least-privilege approach, even in the Lab, through AWS Identity Center. + +We are leveraging a number of technologies and concepts to improve the timing, costs, and efficiency overall. + +* organizations + * a huge part of our configuration + * enables RAM (sharing) + * org cloudtrail + * security (security hub, guard duty, inspector, etc.) + * sso (in progress, works against our IDP) + * all accounts are in the organization + * central logging (future) +* transit gateway + * a fully meshed set of common routing across the entirety of the AWS landscape, with each environment routing accordingly to the segments above, and without the use of complicated vpc peering + * shared vpcs + * created in a single restricted account, and shared out to all of AWS + * setting up a shared vpc fully networked with the rest of AWS and on-prem (minus any specialized firewall rules) is no longer a 2-3 week process. It is done in probably 15 minutes or less +* shared vpc endpoints + * these are tied into the shared vpcs, and are centrally located in the network account. One set of VPCE per region will reduce costs by quite a bit +* route53 + * vpcs have route53 set and use custom domain names within aws for auto-registration of ec2 instances, and easy of creation of other route53 records via terraform modules for other things + * fully integrated with on-prem DNS + * caveats for use within govcloud as it relates to health checking and public DNS +* containers + * recommended technology is EKS, we have probably 30+ clusters + * these are deployed through a standard starting code base in a multi-step configuration + * standardized addons planned (like, elk, etc) +* standardized account structure + * single vpc, some cases where there may be two (say a prod account includes both common and prod, as common is a prod thing) + * naming standards of resources include the environment abbreviations (from above list of environments) + * additional type of account is a system acceptance (sa), where you are able to test out components used in common environments, say like a jenkins test. The environment VPC here may vary based on use case + * standard tagging, to be enforced (by SCPs from organizations) +* consistency and simplicity + * following the same pattern, from the same set of source, for doing the same thing multiple times makes it far easier to support and understand + * strive for the simplest solution possible. It may not seem that way with some of the things we have, but they are the right level of necessary complexity based on how the services work +* written documentation and requirements + * all terraform modules include documentation from the beginning + * many how-to guides available https://github.e.it.census.gov/terraform/support/tree/master/docs/how-to + * lots of core documentation https://github.e.it.census.gov/terraform/cloud-information/tree/master/aws/documentation + * when docs are missing, we write them + * we expect written requirements, with text and diagrams (not just diagrams), in business, non-solution language, and answering questions like + * who + * what + * when + * where + * why + * sometimes how + +# Accounts + +* One per Program/Project/Organization per environment (common, dev, test, ite, uat, qa, stage, prod) +* SA account as needed + +# Networking +* regions +* segmentation by environment +* dmz +## Transit Gateway +* segmentation by environment +* vpns by environment +## VPCs +## Shared VPCs +## Shared VPC Endpoints + +# Compute +## Containers +## Instances +## Serverless + +# Management +## Organizations +## SSO +* plan +* access +* groups +* IDMS integration +## Security Tools +* Security Hub +* Guard Duty +* Inspector +## Logging + +# Shared Services +## ACM PCA +## Route53 + +# AWS Services +* non-FedRAMP +* acceptable use by service + +# Links + + +# CHANGELOG +* 1.0.0 -- 2023-07-26 + - initial document, create outline + +* 1.0.1 -- 2023-08-03 + - add draft text from email message + +* 1.1.0 -- 2025-09-16 + - added lab details