Replies: 4 comments 3 replies
-
This seems like a bug to me.
so i've been wanting to take a crack at rewriting the The So when resolving/inducing a slot, we should always make Here's another illustrative example that imo is buggy: id: test_schema
name: test
imports:
- linkml:types
classes:
MyClass:
attributes:
my_slot:
any_of:
- range: string
- range: int >>> sv = SchemaView('/Users/jonny/Desktop/test_schema.yaml')
>>> slot = sv.induced_slot('my_slot', 'MyClass')
>>> type(slot.range)
NoneType This is getting a bit at the difference between the metamodel python classes which are just representations of the schema yaml objects and what is produced by Distressingly this causes the model to be generated incorrectly by pydanticgen: class Extraction(PlannedProcess):
"""
A material separation in which a desired component of an input material is separated from the remainder.
"""
# ... other stuff ...
has_input: List[str] = Field(default_factory=list, description="""An input to a process.""") which is none of the three options in the OP. That's because while it correctly gets the pythongen also gets it incorrect, but in a different way: @dataclass
class Extraction(PlannedProcess):
"""
A material separation in which a desired component of an input material is separated from the remainder.
"""
has_input: Union[Union[str, NamedThingId], List[Union[str, NamedThingId]]] = None where it misses the So bc this isn't resolved well by SchemaView, there are divergent errors in the generators. Things downstream from generators should never have to worry about inheritance, ancestry, etc - by the time the schema reaches them it should already be "cooked" (using the parlance used in a few places in the library). The simplest fix here would just be to clear |
Beta Was this translation helpful? Give feedback.
-
This isn't a bug though...
`range` is single-valued, so None/Any is the most specific single valued
range that can be returned.
But the behavior is incomplete. We want slot induction to populate
`range_expression` with the non-redundant entailed expression. There is
code in the main repo
https://github.com/linkml/linkml/blob/main/linkml/transformers/logical_model_transformer.py
that could be used here.
…On Fri, May 10, 2024 at 10:21 PM Jonny Saunders ***@***.***> wrote:
This seems like a bug to me.
slot_usage is supposed to refine the usage of a slot in the context of a
given class - to extend or override the default slot definition (tho
@cmungall <https://github.com/cmungall> has said that linkml should be
monotonic <#1962>).
so i've been wanting to take a crack at rewriting the induced_slot method
for awhile because it's a pretty critical one (among the others that handle
inheritance and overides) for my purposes, and I think I can see where this
is coming from. I would expect induced_slot to work in a "ancestor-wise"
way - for each of the ancestor definitions/redefinitions of a slot, resolve
each of those from the tips of the tree down to the slot in question.
Instead it works in a "metaslot-wise" way
<https://github.com/linkml/linkml-runtime/blob/c7815cb81c539f919e2ec48d2ca38d06da5aeb18/linkml_runtime/utils/schemaview.py#L1352>
where for each slot we iterate through the metaslots and *then*
<https://github.com/linkml/linkml-runtime/blob/c7815cb81c539f919e2ec48d2ca38d06da5aeb18/linkml_runtime/utils/schemaview.py#L1360>
do an ancestor-wise pass. This also explains a bit why induced_slot is
one of the methods that by itself takes the most time in the library.
The any_of construction makes sense why it is the way it is (like JSON
Schema), but I think that's a syntactic thing that should def be smoothed
out in the interface. The any_of needs to be computed along with the range
for each ancestor layer, but currently they aren't, so the SlotDefinition
having range = 'NamedThing' *and* the any_of from the slot_usage is
incorrect.
So when resolving/inducing a slot, we should *always* make range behave
like slot_range_as_union.
Here's another illustrative example that imo is buggy:
id: test_schemaname: testimports:
- linkml:types
classes:
MyClass:
attributes:
my_slot:
any_of:
- range: string
- range: int
>>> sv = SchemaView('/Users/jonny/Desktop/test_schema.yaml')>>> slot = sv.induced_slot('my_slot', 'MyClass')>>> type(slot.range)NoneType
This is getting a bit at the difference between the metamodel python
classes which are just representations of the schema yaml objects and what
is produced by SchemaView. The purpose (i think?) of SchemaView is to
provide an interface to the schema s.t. the slots and classes "behave like
they should" based on the metamodel in ways that need to be materialized
above the literal representation in the yaml. Schemaview not behaving this
way makes for a lot of awkward moments in the generators, eg
PydanticGenerator.range_class_has_identifier_slot (which should be in
SchemaView) instead of checking all the items in range we need to do a
separate iteration over any_of
<https://github.com/linkml/linkml/blob/b5313f98ddb508e414e0ec62a75b2744f64fa59d/linkml/generators/pydanticgen/pydanticgen.py#L350-L356>.
Note that we are *only* checking any_of there rather than also handling
exactly_one_of or all_of which can also have ranges, so that's incomplete
behavior. Lots of other examples across the generators.
Distressingly this causes the model to be generated incorrectly by
pydanticgen:
class Extraction(PlannedProcess):
""" A material separation in which a desired component of an input material is separated from the remainder. """
# ... other stuff ...
has_input: List[str] = Field(default_factory=list, description="""An input to a process.""")
which is none of the three options in the OP. That's because while it
correctly gets the any_of range, the range classes Biosample and
ProcessedSample have an identifier slot id and aren't marked to be inline
d.
pythongen also gets it incorrect, but in a different way:
@dataclassclass Extraction(PlannedProcess):
""" A material separation in which a desired component of an input material is separated from the remainder. """
has_input: Union[Union[str, NamedThingId], List[Union[str, NamedThingId]]] = None
where it misses the any_of range in the slot usage.
So bc this isn't resolved well by SchemaView, there are divergent errors
in the generators. Things downstream from generators should never have to
worry about inheritance, ancestry, etc - by the time the schema reaches
them it should already be "cooked" (using the parlance used in a few places
in the library). The simplest fix here would just be to clear range or
any_of when one or the other is defined in an ancestor class in
induced_slot, but the longer term fix will be to make SchemaView
<#1739 (comment)>
recursive <#1839>, both within a
schema and across its imports, where each class and slot in the dependency
tree is fully resolved once and each inheriting/extending class/slot/etc.
resolves itself relative to its parents. In this case, then, when
materializing the has_input slot, we would just be looking at the
definition of has_input in PlannedProcess, and applying a single
apply_slot_usage method that knows how to both overwrite scalar rules as
well as resolve any_of, exactly_one_of etc. between a parent and child
class, and the problem is resolved for all downstream use.
—
Reply to this email directly, view it on GitHub
<#2101 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOMASTIFZJXWR34UVWTZBWTG3AVCNFSM6AAAAABHMAFBEWVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TGOJSGUZDO>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
noting that we made an issue out of this discussion last week: #2103 |
Beta Was this translation helpful? Give feedback.
-
I think addition an option to get_class that materializes the entailments
is a great idea!
…On Mon, May 13, 2024 at 12:18 PM Jonny Saunders ***@***.***> wrote:
I think I get the idea, I guess another way of saying what im saying is
that the entailment getters are not clearly enough delineated from the
asserted getters. Ie. I would expect that a simple schema loader to just
return the asserted, literal models, but I would expect schemaview to be
all entailed. To me it is pretty non-intuitive to need to get_class, loop
through its attrs and slots with induced_slot, and now potentially call a
third method to get the resolved range of that slot to get a fully
materialized class - I would think that get_class should return the class
with all its entailments computed for me, which would make using schemas
way easier imo.
I get thats a decent amount of work, im trying to make the case that we
should have that as a goal :)
—
Reply to this email directly, view it on GitHub
<#2101 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAMMOIKG4XYNSHG3IRTEWDZCEGW5AVCNFSM6AAAAABHMAFBEWVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TIMRVGIZTK>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi, this is my first time posting a question here.
Background
I develop an application that interacts with a database that complies with a LinkML schema. That LinkML schema is: nmdc-schema.
I am currently writing a Python script that uses an instance of LinkML's SchemaView Python class to traverse that LinkML schema. I am particularly interested in identifying (and learning about) slots that can contain references to other slots. This is all in pursuit of implementing some referential integrity checks.
That
nmdc-schema
schema contains the definition of a class named Extraction. Based on a conversation with a teammate and my limited understanding of LinkML, I am under the impression that the has_input slot (in the context of theExtraction
class, specifically) must consist of [a reference to] either aBiosample
or aProcessedSample
, despite the default range of thehas_input
slot beingNamedThing
.Question
SchemaView
instance to get that list:Biosample
andProcessedSample
?Here's a screenshot of a Python notebook where I've invoked a few
SchemaView
methods:slot_definition.range
(whereslot_definition
is returned by.induced_slot()
) gives me the default range of the slotslot_definition.any_of[0].range
and[1].range
give me the two class names in theany_of
listschema_view.slot_range_as_union( . . . )
gives me all three class namesAssuming my impression (stated in the "Background" section above) is correct, method (B) is the only one that gives me what I'm looking for. However, I only knew to look at the
any_of
list after a teammate pointed it out to me. Is there a more general way aSchemaView
instance can be used to get a "fully resolved" list of the classes whose instances a slot can contain [references to] (instead of "manually" checking for the presence of a boolean expression; e.g.any_of
,none_of
)?Footnote for future readers: I work with some of the LinkML developers and have made some assumptions about their familiarity with my situation.
Beta Was this translation helpful? Give feedback.
All reactions