add dataset types snippet

terraform · Jul 11, 2025 · 6e53a24 · 6e53a24
1 parent 43d4c77
commit 6e53a24
Showing 1 changed file with 10 additions and 0 deletions.
diff --git a/aws/projects/edl/dataset-types.md b/aws/projects/edl/dataset-types.md
@@ -0,0 +1,10 @@
+# Dataset Types
+
+CMS is authoritative for its volume, also called the type, and considered a namespace.  Like geo, decennial, econ, mixed, etc.  CMS is responsible for all of the datasets underneath it. These become IRE datasets. An IRE dataset is a DMS dataset.  It may be wholly sourced from CMS (say maybe external data). It may be a subset of a non-IRE dataset (say, something made internal like BR), a subset of columns or records from some other dataset.  Let's say for BR they wanted it in IRE as a dataset, but without the SSN column.  This would be a diff dataset, still called br.  CODS maintains these datasets.
+
+DMS is authoritative for its volumes, and those cannot be the same names (type or type_id) as one in CMS.  So, we have a prefix of edl- on these to distinguish them, and they have different type_id values (as these go into the posix group).  In the example above, the full BR dataset would be edl-econ/br. A single namespace cannot be authoritative in both systems.  The data owner maintains these datasets.
+
+The path on a file system, and in S3 or whaterver, must preserve these types:
+
+/data/{type}/{group}/{instance}
+