From 6e53a2487f9be3f66e6c4f2564b26c84e90682bd Mon Sep 17 00:00:00 2001
From: badra001 <donald.e.badrak.ii@census.gov>
Date: Fri, 11 Jul 2025 15:31:40 -0400
Subject: [PATCH] add dataset types snippet

---
 aws/projects/edl/dataset-types.md | 10 ++++++++++
 1 file changed, 10 insertions(+)
 create mode 100644 aws/projects/edl/dataset-types.md

diff --git a/aws/projects/edl/dataset-types.md b/aws/projects/edl/dataset-types.md
new file mode 100644
index 00000000..218a9178
--- /dev/null
+++ b/aws/projects/edl/dataset-types.md
@@ -0,0 +1,10 @@
+# Dataset Types
+
+CMS is authoritative for its volume, also called the type, and considered a namespace.  Like geo, decennial, econ, mixed, etc.  CMS is responsible for all of the datasets underneath it. These become IRE datasets. An IRE dataset is a DMS dataset.  It may be wholly sourced from CMS (say maybe external data). It may be a subset of a non-IRE dataset (say, something made internal like BR), a subset of columns or records from some other dataset.  Let's say for BR they wanted it in IRE as a dataset, but without the SSN column.  This would be a diff dataset, still called br.  CODS maintains these datasets.
+ 
+DMS is authoritative for its volumes, and those cannot be the same names (type or type_id) as one in CMS.  So, we have a prefix of edl- on these to distinguish them, and they have different type_id values (as these go into the posix group).  In the example above, the full BR dataset would be edl-econ/br. A single namespace cannot be authoritative in both systems.  The data owner maintains these datasets.
+ 
+The path on a file system, and in S3 or whaterver, must preserve these types:
+ 
+/data/{type}/{group}/{instance}
+