Split `Record` concept into "Workflow Record" (`WRecord`) and "Forward-model Record" (`FMRecord`). #128

pinkwah · 2021-06-10T16:00:47Z

I think we should split up the concept of "record" into a workflow and forward-model records.

Currently, we have both concepts as one. This causes issues when dealing with realization_index. Because we know that parameter records are necessarily forward-model records, it gives us rules for how to deal with the existence or lack of realization_index. Eg, if one uploads a parameter record without specifying a realization_index, we know that each row represents a separate realisation, and can act accordingly. This is not true for non-parameter matrices and all blobs. This makes for inconsistent behaviour.

I therefore think we should split these up. One is a "workflow", where the following is true:

Names are unique
Matrix: As long as it's a valid matrix, we're okay
Blob: We accept any data

The following is true for a forward-model record:

Names and realization_index are unique (no change to today's code)
Matrix: Possible to specify realization_index. Not specifying it means we expect a matrix where the first dimension represents the realisation (rows).
Blob: Possible to specify realization_index. Not specifying it means we expect the blobs to be uploaded in a certain archiving format, eg. tar, which is a very simple format that requires no compression or any buffers, and would be simple to implement in Python for the client.

Basically, forward-model records are explicit in that they contain many realisations, and thus we have two but consistent set of endpoints.

The text was updated successfully, but these errors were encountered:

sondreso · 2021-06-14T07:43:50Z

I don't think we should name these endpoints/records after the entity that we consider as the "producer" today. Especially the workflow name is bad, and it really doesn't say anything about the key features of the records that is described here.

Otherwise I agree with the described approach.

pinkwah · 2021-06-14T10:30:38Z

I agree wrt. the naming, don't like it either.

xjules · 2021-06-14T14:07:16Z

The following is true for a forward-model record:
Names and realization_index are unique (no change to today's code)
Matrix: Possible to specify realization_index. Not specifying it means we expect a matrix where the first dimension represents the realisation (rows).
Blob: Possible to specify realization_index. Not specifying it means we expect the blobs to be uploaded in a certain archiving format, eg. tar, which is a very simple format that requires no compression or any buffers, and would be simple to implement in Python for the client.

Internally, in the database this (both matrix and blobs) should be stored separately with realization_index set, ie. when we want to retrieve it we can specify realization_index and get just the data for the specified realization_index.

pinkwah · 2021-09-03T09:22:34Z

Records and Sub-Records

I had a think and this a better idea I think. Let's separate records into records and sub-records.

A Record is a piece of data uniquely identified by a name. It can be attached to Ensembles, but also to Experiments. The latter is new in ERT Storage, and gets us closer to what ERT 3 needs. In particular, it is possible to attach observation data to experiments as a record, rather than keeping it in a separate database table.

A sub-record captures the "forward-model record" functionality. Sub-records are also records and can be accessed as such, using the semantics described in the OP. A record has sub-records iff. it was created using the sub-record-specific endpoints, or if it was flagged as containing sub-records.

Examples:

GET /experiment/{}/records/{name}: Get an experiment-wide record {name}
GET /experiment/{}/subrecords/{name}/14: Get an experiment-wide sub-record 14 of record {name}.
GET /ensembles/{}/records/{name}: Get an ensemble-wide or experiment-wide record {name} (scoping rules)
POST /ensembles/{}/records/{name}/matrix: Post a numerical record that does not contain sub-records. That is, it's not allowed to fetch a sub-record from it. This record may contain any structure as long as it adheres to ERT's numerical data requirements. Eg, the index may contain timestamps or strings, or slam poetry.
POST /ensembles/{}/records/{name}/matrix?subrecord=true: Post a numerical record as a batch of sub-records, where the first column represents the realization_index of each row. It is permitted to access this record as sub-records.
POST /ensembles/{}/subrecords/{name}/14/matrix: Post 14th sub-record of record {name} as a matrix.

Basically, what I'm saying is (ignoring opaque data for now):

SubRecord: Numerical data
Record: Numerical data or Integer-indexed array of (uniform) SubRecord

mortalisk · 2022-01-05T12:43:50Z

Why do we need to distinguish at all? Can't we just have a general indexing feature in matrices? And then by context we know if we are using it for storing forward model data, or something else? Do we need some hard coded feature for indexing by realization_index as compared to other indexes?

pinkwah added enhancement New feature or request question Further information is requested labels Jun 10, 2021

pinkwah mentioned this issue Sep 7, 2021

Record maturation equinor/ert#1789

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split `Record` concept into "Workflow Record" (`WRecord`) and "Forward-model Record" (`FMRecord`). #128

Split `Record` concept into "Workflow Record" (`WRecord`) and "Forward-model Record" (`FMRecord`). #128

pinkwah commented Jun 10, 2021

sondreso commented Jun 14, 2021

pinkwah commented Jun 14, 2021

xjules commented Jun 14, 2021

pinkwah commented Sep 3, 2021 •

edited

Loading

mortalisk commented Jan 5, 2022

Split Record concept into "Workflow Record" (WRecord) and "Forward-model Record" (FMRecord). #128

Split Record concept into "Workflow Record" (WRecord) and "Forward-model Record" (FMRecord). #128

Comments

pinkwah commented Jun 10, 2021

sondreso commented Jun 14, 2021

pinkwah commented Jun 14, 2021

xjules commented Jun 14, 2021

pinkwah commented Sep 3, 2021 • edited Loading

Records and Sub-Records

mortalisk commented Jan 5, 2022

Split `Record` concept into "Workflow Record" (`WRecord`) and "Forward-model Record" (`FMRecord`). #128

Split `Record` concept into "Workflow Record" (`WRecord`) and "Forward-model Record" (`FMRecord`). #128

pinkwah commented Sep 3, 2021 •

edited

Loading