Skip to content

Commit

Permalink
Morga471 cluster (#20)
Browse files Browse the repository at this point in the history
* yep

* set back to normal

* missed tempo

* fix branch ref

* change branch ref to test provider-resolution

* fix min vals

* 2 is the lowest

* docs and keycloak

* use default for eks again

* tempo and kiali updates while working on keycloak

* missed a comma

* almost

* no v

* cleanup

* namespaces

* use main

* fmt

* namespace changes

* update internal url ref

* fmt

* versions

* more wip:

* keycloak wip

* update prom internal url input value

* test changes on prom

* deleted old cluster platform-eng-eks-test and created new cluster platform-eng-eks-srn

* testing more autoscaling stuffs

* wip

* wip

* wip

* use my eks

* isolate karpenter again for debug

* 1.3.0 is not ready

* make grafana work again

* increment grafana operator chart version

* otel added

* fix gatekeeper chart version

* ordering

* test branch

* use newer image

* update loki memcached

* vers

* keycloak defaults

* put keycloak in keycloak namespace for debug

* removed a few folders from workspace

* update grafana tg

* remove old module from workspace

* reset branches to default

* missed one

* fmt

* more fmt

* use client id and secret

* fix service name regex violation

* updates

* update from lukes pr

* disable gatekeeper

* updated

---------

Co-authored-by: Srini Nangunuri <srinivasa.nangunuri@census.gov>
  • Loading branch information
morga471 and nangu001 committed Mar 17, 2025
1 parent 819a1ec commit dcf74b0
Show file tree
Hide file tree
Showing 70 changed files with 2,701 additions and 770 deletions.
24 changes: 24 additions & 0 deletions .checkov.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
branch: master
download-external-modules: true
evaluate-variables: true
external-checks-dir:
- security/custom_checks
framework:
- terraform
- kubernetes
output:
- cli
- json
- junitxml
skip-check:
- CKV_AWS_79 # Instance Metadata Service Version 1
- CKV_AWS_130 # Ensure VPC subnets are not assigned public IP by default
quiet: true
compact: true
directory:
- .
- modules/*
secrets-scan-file-type:
- tf
- yaml
- json
28 changes: 17 additions & 11 deletions .github/platform-tg-infra.code-workspace
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,12 @@
"folders": [
{
"name": "platform-tg-infra",
"path": "../"
"path": ".."
},
{
"name": "tfmod-cert-mgr",
"path": "../../tfmod-cert-mgr"
},
{
"name": "tfmod-config-job",
"path": "../../tfmod-config-job"
},
{
"name": "tfmod-custom-iam-role-for-service-account-eks",
"path": "../../tfmod-custom-iam-role-for-service-account-eks"
},
{
"name": "tfmod-eks",
"path": "../../tfmod-eks"
Expand All @@ -28,6 +20,10 @@
"name": "tfmod-eks-dns",
"path": "../../tfmod-eks-dns"
},
{
"name": "tfmod-gogatekeeper",
"path": "../../tfmod-gogatekeeper"
},
{
"name": "tfmod-grafana",
"path": "../../tfmod-grafana"
Expand All @@ -48,6 +44,10 @@
"name": "tfmod-karpenter",
"path": "../../tfmod-karpenter"
},
{
"name": "tfmod-keycloak",
"path": "../../tfmod-keycloak"
},
{
"name": "tfmod-kiali",
"path": "../../tfmod-kiali"
Expand All @@ -60,6 +60,10 @@
"name": "tfmod-metrics-server",
"path": "../../tfmod-metrics-server"
},
{
"name": "tfmod-open-telemetry",
"path": "../../tfmod-open-telemetry"
},
{
"name": "tfmod-prometheus",
"path": "../../tfmod-prometheus"
Expand All @@ -69,13 +73,15 @@
"path": "../../tfmod-tempo"
},
{
"name": "terraform-aws-eks",
"path": "../../terraform-aws-eks"
},
{
"path": "../../karpenter-provider-aws"
"name": "terragrunt",
"path": "../../terragrunt"
},
{
"path": "../../terragrunt"
"path": "../../tfmod-config-job"
}
]
}
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Local .terraform directories
**/.terraform/*
**/apply.log
**/plan.log
**/destroy.log

# terraform lock file.
**/.terraform.lock.hcl
Expand Down
48 changes: 48 additions & 0 deletions configs/node-groups.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
nodeGroups:
- name: general-purpose
instanceTypes:
- m6i.xlarge
- m6a.xlarge
- m5.xlarge
minSize: 2
maxSize: 10
desiredSize: 2
labels:
node-type: general
taints: []
updateConfig:
maxUnavailable: 1

- name: compute-optimized
instanceTypes:
- c6i.2xlarge
- c6a.2xlarge
- c5.2xlarge
minSize: 1
maxSize: 20
desiredSize: 2
labels:
node-type: compute
taints:
- key: workload
value: batch
effect: NoSchedule
updateConfig:
maxUnavailable: 2

- name: memory-optimized
instanceTypes:
- r6i.2xlarge
- r6a.2xlarge
- r5.2xlarge
minSize: 1
maxSize: 10
desiredSize: 2
labels:
node-type: memory
taints:
- key: workload
value: memory-intensive
effect: NoSchedule
updateConfig:
maxUnavailable: 1
36 changes: 36 additions & 0 deletions configs/resource-quotas.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: v1
kind: ResourceQuota
metadata:
name: default-quota
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "100"
services: "50"
secrets: "100"
configmaps: "100"
persistentvolumeclaims: "50"

---
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
spec:
limits:
- type: Container
default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 100m
memory: 256Mi
max:
cpu: "4"
memory: 8Gi
min:
cpu: 50m
memory: 64Mi
88 changes: 88 additions & 0 deletions docs/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Platform Infrastructure Architecture

## Complete Platform Architecture

```mermaid
graph TD
%% Core Network Infrastructure
VPC[VPC Module] --> DNS[DNS Module]
VPC --> SUBNETS[Subnet Configuration]
SUBNETS --> PRIVATE[Private Subnets]
SUBNETS --> PUBLIC[Public Subnets]
%% EKS Cluster and Core Components
VPC --> EKS[EKS Cluster]
EKS --> IAM[IAM Roles Module]
EKS --> EKS_CONFIG[EKS Configuration]
EKS --> KARPENTER[Karpenter]
%% Security and Access Management
EKS --> CERT_MGR[Cert Manager]
EKS --> GATEKEEPER[GoGatekeeper]
%% Service Mesh
EKS_CONFIG --> ISTIO[Istio Service Mesh]
ISTIO --> KIALI[Kiali Dashboard]
ISTIO --> INGRESS[Service Ingress]
%% Monitoring and Observability
EKS --> MONITORING[Monitoring Stack]
MONITORING --> PROMETHEUS[Prometheus]
MONITORING --> GRAFANA[Grafana]
MONITORING --> LOKI[Loki Log Aggregation]
MONITORING --> TEMPO[Tempo Tracing]
%% Additional Services
EKS --> DASHBOARD[Kubernetes Dashboard]
EKS --> METRICS[Metrics Server]
EKS --> KEYCLOAK[Keycloak SSO]
%% Infrastructure Management
TERRAGRUNT[Terragrunt] --> VPC
TERRAGRUNT --> EKS
%% Database Layer
VPC --> RDS[RDS Database]
%% Styling
classDef core fill:#f9f,stroke:#333,stroke-width:2px
classDef security fill:#bbf,stroke:#333,stroke-width:2px
classDef monitoring fill:#bfb,stroke:#333,stroke-width:2px
class VPC,EKS,EKS_CONFIG core
class CERT_MGR,GATEKEEPER,IAM security
class PROMETHEUS,GRAFANA,LOKI,TEMPO monitoring
```

## Component Descriptions

### Core Infrastructure
- **VPC Module**: Network foundation with public/private subnets
- **EKS Cluster**: Managed Kubernetes service
- **Karpenter**: Autoscaling node management
- **DNS Module**: Route53 DNS management

### Security Layer
- **Cert Manager**: Certificate lifecycle management
- **GoGatekeeper**: Policy enforcement
- **IAM Roles**: AWS IAM integration

### Service Mesh
- **Istio**: Service mesh implementation
- **Kiali**: Service mesh visualization
- **Service Ingress**: External traffic management

### Monitoring Stack
- **Prometheus**: Metrics collection
- **Grafana**: Metrics visualization
- **Loki**: Log aggregation
- **Tempo**: Distributed tracing

### Additional Services
- **Kubernetes Dashboard**: Cluster management UI
- **Metrics Server**: Resource metrics
- **Keycloak**: Identity management

### Infrastructure Management
- **Terragrunt**: Infrastructure deployment orchestration
- **RDS**: Managed database services
56 changes: 56 additions & 0 deletions docs/DOCUMENTATION_STANDARDS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Documentation Standards Guide

## README Structure
Each module must include a README.md with the following sections:

1. Overview
- Purpose
- Key features
- Architecture diagram

2. Prerequisites
- Required tooling
- Required permissions
- Dependencies

3. Usage
- Basic example
- Advanced examples
- Input variables table
- Output variables table

4. Architecture
- Component diagram
- Network flow
- Security considerations

5. Operations
- Deployment guide
- Monitoring
- Troubleshooting
- Maintenance

## Changelog Format
Use Commitizen convention:

```
feat: New feature
fix: Bug fix
docs: Documentation changes
style: Formatting changes
refactor: Code restructure without behavior change
test: Test updates
chore: Maintenance tasks
```

## Diagrams
- Use PlantUML for architecture diagrams
- Include source files in `docs/diagrams`
- Export PNG/SVG to `docs/images`
- Keep diagrams up to date with code changes

## Usage Examples
- Provide basic and advanced examples
- Include realistic variable values
- Document required permissions
- Include expected outputs
Loading

0 comments on commit dcf74b0

Please sign in to comment.