Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Type Definition: Identifier Code #89

Closed
knelson-farmbeltnorth opened this issue Mar 10, 2023 · 8 comments
Closed

Data Type Definition: Identifier Code #89

knelson-farmbeltnorth opened this issue Mar 10, 2023 · 8 comments

Comments

@knelson-farmbeltnorth
Copy link
Contributor

knelson-farmbeltnorth commented Mar 10, 2023

How should we identify DTDs? Possible Ideas are:

a. A signed 32-bit integer, where positive integers represent a mapping to the published definition system, and negative integers represent a custom definition stored within the ADM. While it seems to me to be the cleanest approach, the disadvantages here are that:
i. This would prescribe how the DTD's reference ID would need to be represented whereas we have otherwise removed prescription of what a Reference Id needs to be in the Standard
ii. We are combining meaning and mapping into a single integer

b. A UUID

c. An unsigned 32-bit integer requiring an additional flag wherever used to signal published vs. custom definition

d. Using the same approach as ISO11783-11, where numbers below a certain threshold are reserved for published definitions

@knelson-farmbeltnorth
Copy link
Contributor Author

Preliminary agreement in 5 April 2023 meeting to code as integers.

Open as to whether custom definitions will be represented as negative integers or signaled by an additional flag where used.

@strhea
Copy link

strhea commented Apr 12, 2023

After thinking about this, I have some reservations about using integers. Most of them go back to human-readability. The paper describes using a prefix to designate custom DTDs.

If you're using a "DTD value" as a named property (for example "area" on a field object), human readability is built in. But when you use it in the context of an array of summary values the human readability is reduced. Is there a problem with using more descriptive string tokens? We've fixed the issue with constantly repeating the DTD in every value, so I don't think there is a memory/size constraint.

@knelson-farmbeltnorth
Copy link
Contributor Author

The dilemma here is how we identify a value's variable in the spatial data. My thought was that we would use the integer of the DTD. ISO takes the approach of using the DLV's index within that specific dataset's listing, but that has its own issues. @zwing99 made the point in yesterday's serialization meeting that it would be trivial to create a utility to replace column identifiers with something more human readable for troubleshooting. If our primary goal is data transfer, the question arises how much we factor in human readability into the raw data.

@strhea
Copy link

strhea commented Apr 12, 2023

Think about this in a non-bulk transfer scenario. What if I'm pulling a single operation record from an API? What if there is a human in the loop, querying the API for specific pieces of data? Wouldn't the human readability be a factor there?

The use of integers is attractive because it's theoretically easier to administer when you're trying to avoid duplication/collisions. Is that enough justification for the reduction in readability?

@knelson-farmbeltnorth
Copy link
Contributor Author

In that scenario, the author of the API could substitute a meaningful term for the integer at their option.

Looking at it from the point of view of going with a textual code, in the case of identifying variables in spatial data, we could create a mapping object per dataset that would map a code to numeric identifier in the binary (the equivalent if ISO's DLV index). As I see it, however, that adds additional complexity for troubleshooting, since instead of referencing/substituting a well known integer in the column name, now I need to refer to another listing to identify the data's meaning. I don't think anyone would advocate prefixing each data value in the binary with a textual identifier.

Also, textual codes have come with their own challenges during implementation. We can simplify the confusion/misapplication that exists in the current system with "vrYieldTotalMass," "vrYieldMass," "vrYieldWetMass," etc., but the fact remains that it is easy for an implementer to look at a human readable code and decide that s/he understands its meaning without actually referencing its full definition. If the code is an integer, there is greater likelihood that data creators will read the definitions before classifying data.

@strhea
Copy link

strhea commented Apr 12, 2023

Isn't the column name going to be the id of the LoggedVariable (that is scoped to that operation)? If so, wouldn't that result in confusion with a DTD id that is an integer?

@knelson-farmbeltnorth
Copy link
Contributor Author

@strhea You are correct, I confused the situation above. The spatial values will need to be keyed by the reference id of the Logged Variable on the Operation in order to correctly capture the product, latency, and variable disposition (actual/target etc.)
Screen Shot 2023-04-14 at 8 57 37 AM

@knelson-farmbeltnorth
Copy link
Contributor Author

Agreed in 26 April 2023 meeting that code will be a human-readable/textual code. An open question is whether any sort of prefix is appropriate. We will define the codes as part of work to draft the initial DTDs in #95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants