add data concept goals

orcestra-campaign · May 16, 2024 · 0d671a5 · 0d671a5
1 parent 5c1c397
commit 0d671a5
Show file tree

Hide file tree

Showing 2 changed files with 31 additions and 0 deletions.
diff --git a/orcestra_book/_toc.yml b/orcestra_book/_toc.yml
@@ -26,6 +26,7 @@ chapters:
     #- file: data_policy
     #- file: data_creation
     #- file: data_access
+    - file: data_concept
     - file: hera5
     - file: temperature_example
 #- file: results

diff --git a/orcestra_book/data_concept.md b/orcestra_book/data_concept.md
@@ -0,0 +1,30 @@
+# Data Concept
+
+## Motivation
+We like to see ORCESTRA as **our common** field campaign.
+All should be able to use the gathered data.
+Together and for mutual benefit.
+The purpose of these goals is to learn from what worked and what didn't work during the EUREC4A field campaign and other projects.
+
+### Goals
+
+The goals are sorted in decreasing priority (i.e. 1 is the most important). We **aim for all of them**, but if we have to cut, we should cut at the end.
+
+1. **a *single* list of existing datasets**<br/>
+   We want a common data collection of our field campaign.
+   Everyone interested in ORCESTRA should be able to find available datasets.
+   For clarity and consistency, there must be exactly one list.
+2. **the datasets in list are *accessible***<br/>
+   Given someone found a dataset in the list, the dataset should be usable.
+   That is, the information in the list must be sufficient for everyone to be able to open the dataset with common tools and little effort.
+3. **datasets are *well-formed* and *analysis-ready***<br/>
+   Useful datasets are typically written once and read often.
+   The overall effort can be reduced if we spend a bit more time on creating the dataset if that facilitates the later use.
+4. **incremental backups are possible**<br/>
+   We expect that the ORCESTRA data collection is a valuable contribution to our scientific field.
+   We should be able to have a backup of this collection.
+   Realistically, the list will evolve over time, thus we will have to update any backups incrementally.
+5. **datasets are on a shared, distributed system**<br/>
+   We want the data system to be use in actual scientific work (not only for "data publication").
+   Traditional systems are often too complicated or slow for day-to-day usage.
+   A distributed system increases the availability and performance (e.g. due to local caches, redundant servers...), which renders the actual use of own published data convenient, fast and fun.