-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
10 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # Dataset Types | ||
|
|
||
| CMS is authoritative for its volume, also called the type, and considered a namespace. Like geo, decennial, econ, mixed, etc. CMS is responsible for all of the datasets underneath it. These become IRE datasets. An IRE dataset is a DMS dataset. It may be wholly sourced from CMS (say maybe external data). It may be a subset of a non-IRE dataset (say, something made internal like BR), a subset of columns or records from some other dataset. Let's say for BR they wanted it in IRE as a dataset, but without the SSN column. This would be a diff dataset, still called br. CODS maintains these datasets. | ||
|
|
||
| DMS is authoritative for its volumes, and those cannot be the same names (type or type_id) as one in CMS. So, we have a prefix of edl- on these to distinguish them, and they have different type_id values (as these go into the posix group). In the example above, the full BR dataset would be edl-econ/br. A single namespace cannot be authoritative in both systems. The data owner maintains these datasets. | ||
|
|
||
| The path on a file system, and in S3 or whaterver, must preserve these types: | ||
|
|
||
| /data/{type}/{group}/{instance} | ||
|
|