Replies: 4 comments 3 replies
-
The DataHarmonizer development crew is looking forward to implementing the above primary_key and foreign_key annotations via relmodel_transformer. I also want to check to see if relmodel_transformer also handles sufficient annotations for detailing compound keys, where say in above example the same schema accommodated a table "TestResult" with primary key school_id x student_id (unique within school but not across schools) x test_id. |
Beta Was this translation helpful? Give feedback.
-
Well, cutting straight to our use-case, we're trying to manage not just one schema but a number of schemas within one overall meta-schema. Envision say an online library of schemas and their main class, enum, slot attributes. Within DH, mainly trying to offer users the experience of editing one schema, but being able to load others side by side into the same data structure to provide relatively easy copy/paste behaviour (with possibly editing of other schemas within the bundle too, which facilitates publishing them all in a bundle.) Below is just a core slice of the meta schema, focusing just on the unique keys, which reference slots/fields that other tables/classes contain without including them below. This is currently unworkable since in unique_key_slots we can't convey which source class/schema a key slot "comes from", i.e. back references as foreign key. Also it is not yet coded to the relmodel transformer you used above. classes: Prefix: Enum: PermissibleValue: The permissible_value_id key shows a reliance on an enum_id which is itself composed of an Enum class name, as well as a schema_id. I had assumed that a normalized PermissibleValue table needed three separate slots/fields - schema_id, enum_id, and name to manage this in a flat table/SQL way. Enums from different schemas may have the same Enum.name . |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Well, for GRDISample_Environmental_Site etc, ah yes they are all multivalued, so I see now that's what is triggering table generation! It would be great if that became an optional feature. |
Beta Was this translation helpful? Give feedback.
-
Documenting part of a conversation with @ddooley. The context here is the use of DataHarmonizer with nested objects, but this is generally of use to anyone who has need to map from a logical schema that is naturally tree-like (inlined objects) to something more relational/tabular, e.g. for data analysis, data entry (not just RDBMSs).
Assume some data about students and exam results:
Ignore for now how you might render this as a table. The structure here is that one student (identified in a globally unique way by their
id
) can have zero-to-many exam results.They can only have one result per exam, so
name
uniquely identifies the result in the context of any one student.This is more apparent if we inline as a dict (but this is just a serialization change, it doesn't alter the semantics)
We would do this by making the student.id be an
identifier
(globally unique), and exam.name by akey
(unique in the context of its parent object).Again, ignoring table serialization for now. The above is a common pattern, and it can be done simply with
identifier
andkey
. We can forbid structures where a student tries to have MATH twice. We also don't accidentally forbid cases where MATH is repeated for two students. This should hold regardless of serialization.The schema for this is straightforward:
Now let's consider a tabular serialization. In LinkML there is a relmodel transformer that is used for generating SQL DDL but is completely independent of SQL. It just rewrites schemas such that nesting/inlining and multiple values are replaced by backreferences.
It's easiest to see the transform by looking at the generated schema
Note that in the source schema we don't have any need of "student_id" in ExamResult, it's "owned" by the parent. But in SQL or in DataHarmonizer there is no concept of inlining, so the transform auto-introduces a foreign key "back reference". And we can auto-infer that the tuple of
(student_id, name)
is unique because (a) name is unique in the context of the owning object (b) student_id uniquely identifies the owning object in a non-inlined context.Here's the actual linkml output of the relmodel transformer:
Actual records will look like
STUDENT:
EXAM_RESULT:
Any form-based data entry tool like DH that doesn't have a native concept of a "parent" object could use this transform. Use the relmodel transformer to make a table-friendly version of the schema that has the backrefs and unique keys inserted. The person authoring the source schema only worries about logical constraints. These will get translated correctly to the relational schema. The form tool can just use this directly. It can use the existing compound key mechanism over the transformed schema without need of any esoteric extensions. It can save JSON that is directly conformant with the relational schema, or it can use the standard transform to use the source schema with nesting.
TODOs:
Beta Was this translation helpful? Give feedback.
All reactions