Skip to content

Commit

Permalink
add dataset types snippet
Browse files Browse the repository at this point in the history
  • Loading branch information
badra001 committed Jul 11, 2025
1 parent 43d4c77 commit 6e53a24
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions aws/projects/edl/dataset-types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Dataset Types

CMS is authoritative for its volume, also called the type, and considered a namespace. Like geo, decennial, econ, mixed, etc. CMS is responsible for all of the datasets underneath it. These become IRE datasets. An IRE dataset is a DMS dataset. It may be wholly sourced from CMS (say maybe external data). It may be a subset of a non-IRE dataset (say, something made internal like BR), a subset of columns or records from some other dataset. Let's say for BR they wanted it in IRE as a dataset, but without the SSN column. This would be a diff dataset, still called br. CODS maintains these datasets.

DMS is authoritative for its volumes, and those cannot be the same names (type or type_id) as one in CMS. So, we have a prefix of edl- on these to distinguish them, and they have different type_id values (as these go into the posix group). In the example above, the full BR dataset would be edl-econ/br. A single namespace cannot be authoritative in both systems. The data owner maintains these datasets.

The path on a file system, and in S3 or whaterver, must preserve these types:

/data/{type}/{group}/{instance}

0 comments on commit 6e53a24

Please sign in to comment.