Skip to content

Commit

Permalink
add data concept goals
Browse files Browse the repository at this point in the history
  • Loading branch information
d70-t committed May 16, 2024
1 parent 5c1c397 commit 0d671a5
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 0 deletions.
1 change: 1 addition & 0 deletions orcestra_book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ chapters:
#- file: data_policy
#- file: data_creation
#- file: data_access
- file: data_concept
- file: hera5
- file: temperature_example
#- file: results
Expand Down
30 changes: 30 additions & 0 deletions orcestra_book/data_concept.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Data Concept

## Motivation
We like to see ORCESTRA as **our common** field campaign.
All should be able to use the gathered data.
Together and for mutual benefit.
The purpose of these goals is to learn from what worked and what didn't work during the EUREC4A field campaign and other projects.

### Goals

The goals are sorted in decreasing priority (i.e. 1 is the most important). We **aim for all of them**, but if we have to cut, we should cut at the end.

1. **a *single* list of existing datasets**<br/>
We want a common data collection of our field campaign.
Everyone interested in ORCESTRA should be able to find available datasets.
For clarity and consistency, there must be exactly one list.
2. **the datasets in list are *accessible***<br/>
Given someone found a dataset in the list, the dataset should be usable.
That is, the information in the list must be sufficient for everyone to be able to open the dataset with common tools and little effort.
3. **datasets are *well-formed* and *analysis-ready***<br/>
Useful datasets are typically written once and read often.
The overall effort can be reduced if we spend a bit more time on creating the dataset if that facilitates the later use.
4. **incremental backups are possible**<br/>
We expect that the ORCESTRA data collection is a valuable contribution to our scientific field.
We should be able to have a backup of this collection.
Realistically, the list will evolve over time, thus we will have to update any backups incrementally.
5. **datasets are on a shared, distributed system**<br/>
We want the data system to be use in actual scientific work (not only for "data publication").
Traditional systems are often too complicated or slow for day-to-day usage.
A distributed system increases the availability and performance (e.g. due to local caches, redundant servers...), which renders the actual use of own published data convenient, fast and fun.

0 comments on commit 0d671a5

Please sign in to comment.