BUG: resolve XSD aliasing and parse missed elements #643

FirefoxMetzger · 2021-07-28T13:42:12Z

🦟 Bug fix

Summary

This PR adds a new generator for XSD from the current SDFormat. It is currently set up to convert SDFormat v1.8, but can easily be backported to the other versions by updating the corresponding CMakeLists.

It fixes multiple issues with the old parser:

parse all sdf elements nested deeper than 1 level below root (previous: and tags were ignored, see XSD schema generation doesn't parse all nested elements #633)
resolve name aliasing of elements defined in X.sdf and X_shape.sdf (see naming conflicts in generated schema files (xsd) #632). This is done by introducing a unique namespace for each file and correctly referencing the desired type in the namespace in question.
change types:type in types.xsd from double to string and add the respective restriction (see: naming conflicts in generated schema files (xsd) #632)

It also handles nested models and nested model states. This was apparently a blocking issue in previous attempts to refactor this script. I didn't encounter this problem directly; however, I had to guess the desired meaning of the ref=X attribute used in this case. I left a comment in the appropriate place.

On a high level, the generator is based around named types. It creates two files for each filename.sdf: one filename.xsd and one filenameType.xsd (the actual workhorse), e.g. model.xsd and modelType.xsd. FilenameType.xsd holds the complex type used by the root element of filename.sdf and filename.xsd is a shim to help parsers to make sense of partial sdf.

Checklist

Signed all commits for DCO
Added tests
Updated documentation (as needed)
Updated migration guide (as needed)
codecheck passed (See contributing)
All tests passed (See test coverage)
While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge

…nto xml_namespaces

The original script wrapped every element inside a choice block; however - as this is only a single element - the choice block has essentially no effect.

azeey · 2021-11-29T16:09:14Z

Unfortunately, we didn't have the resources to review this PR in October or November and it looks like we we may not be able to in the coming couple of months. My apologies for the delay, but I wanted to let you know it's still in our radar.

Bi0T1N · 2022-03-12T17:40:56Z

@FirefoxMetzger
Can you please rebase this on the head of the current sdf branch? The changes for #642 have already been merged via #662 (with a proper <xsd:whiteSpace value="collapse"/>) but are also listed as changes here (without <xsd:whiteSpace value="collapse"/>).

Bi0T1N

I was also working on a Python script for the conversion and took the opportunity to review your code, as it already familiarized me with SDF -> XSD conversion, but your approach is more promising than my simple Ruby -> Python conversion. 😃

tools/xmlschema.py

Bi0T1N · 2022-03-12T20:22:11Z

tools/xmlschema.py

+cmd_arg_parser.add_argument(
+    "-i",
+    "--in",
+    dest="source",


Maybe sdf_file would be more suitable as variable name? In the case of source it is not clear whether it is a file, a path or a directory name.

tools/xmlschema.py

Bi0T1N · 2022-03-12T20:43:00Z

tools/xmlschema.py

+        raise RuntimeError(f"Unknown simple type: {in_type}") from None
+
+
+# collect existing namespaces


Better put all declarations above and the instruction code into a

if __name__ == "__main__": cmd_arg_parser = argparse.ArgumentParser( description="Create an XML schema from a SDFormat file." ) ... xsd_schema = setup_schema(used_ns, use_default_ns=False) xsd_schema.append(element) with open(out_dir / (source.stem + ".xsd"), "wb") as out_file: ElementTree.ElementTree(xsd_schema).write(out_file, encoding="UTF-8", xml_declaration=True)

as this will increase the readability (and allows to reuse your classes).

@Bi0T1N Hmm, I think this is mainly a stylistic choice.

The if __name__ == "__main__" guard is useful if your have double-barreled files that - in parallel - function as both a script and as a module. Here, the purpose is to be a script only (there is no intention towards reusing the classes/functions elsewhere), so I chose not to use the guard. In my experience, that leads to cleaner scripts, because you can write (and debug) them sequentially rather than having to jump around the file, because classes live at the top and execution logic at the bottom.

tools/xmlschema.py

Bi0T1N · 2022-03-12T21:10:55Z

tools/xmlschema.py

+            el.set("type", f"{other_file.stem}:{name}Type")
+
+        # TODO: remove this line
+        declared.setdefault("imports", set()).add(other_file.stem)


Remove it? declared variable isn't used. 😄

It used to be the variable in which the imported namespaces were collected, but that changed with a refactor. This comment is mainly a reminder to clean up once there is general approval on the approach this script takes and we are moving towards an actual merge.

Bi0T1N · 2022-03-12T21:15:16Z

tools/xmlschema.py

+            "*": ("0", "unbounded"),
+            "-1": ("0", "0"),
+        }
+        min_occurs, max_occurs = required_codes[self.element.attrib["required"]]


This already shows up above so make it a function to reduce code duplication

Bi0T1N · 2022-03-12T21:32:06Z

tools/xmlschema.py

+            else:
+                # XSD 1.1 allows maxOccurs="unbound" within xs:all, so 
+                # we could drop this entire else block if we could upgrade
+                # to XSD 1.1. (xmllint only supports XSD 1.0)


Couldn't find an issue for it on GitLab, just an older issue in the closed bug tracker. So maybe report it and hope for the best?

This isn't an issue with XSD but rather an issue with sdformat. XSD v1.0 (the schema version used here) is limited in that it doesn't allow us to represent the true capabilities of SDF.

In SDF, child elements may appear 0, 1, or infinitely many times depending on the element and - this is what kills XSD 1.0 - may do so in any order. In XSD you may choose exactly one element from a set of elements (xs:choice), you may specify a block in which elements may occur in any order and with a specified (and finite) min/max count for each element (xs:all), or you may specify a block in which elements must occur in the given order but may do so the given (potentially infinite) number of times. Neither of these choices fits.

The way the ruby generator solved this is by aggregating all valid children into a xs:choice element which may repeat 0 to inf many times. This works for elements with infinite counts but doesn't enforce maxOccurs for elements that may only occur once. If we, thus, validate SDF files with the resulting schema, it may produce false negatives by saying that a SDF is valid even though a unique element occurs twice.

The way the commented out code solves this is by grouping elements into a group with finite count (<=1) and a group with infinite count. It then inserts the xs:choice block discussed above (for infinite-count elements only) between every two fine-count elements. It then does this for every permutation of finite-count elements (because in SDF elements may occur in any order, but we need to validate that unique elements don't reoccur). This will result in XSD that doesn't produce above false negatives, but it comes at the cost of massive schemata (GBs in magnitude).

So for now I just do the same here as the ruby script (accept that the result produces false negatives), because we can't change how SDF works, and we also can't bump the XSD version without prior feedback from one of the maintainers.

In XSD 1.1 this is less of a problem because above mentioned xs:all block allows elements with infinite counts. This means that we can simply use an xs:all block instead of a xs:choice block and be happy.

Thanks for the detailed explanation. 👍
My comment was just about xmllint only supports XSD 1.0 which is probably a reason why we can't simply use XSD 1.1 as there is no tool for linting. An option might be to only use XSD 1.1 if there is no other choice (= you need XSD 1.1 to represent it properly) and otherwise stick with XSD 1.0 - I don't know if mixing the two is even supported.

Bi0T1N · 2022-03-12T21:36:20Z

tools/xmlschema.py

+
+def setup_schema(used_ns: list, use_default_ns: bool = True) -> ElementTree.Element:
+    xsd_schema = ElementTree.Element(_to_qname("xs:schema"))
+    xsd_schema.set("version", "1.1")


Shouldn't this be "1.0"? From above:

# XSD 1.1 allows maxOccurs="unbound" within xs:all, so # we could drop this entire else block if we could upgrade # to XSD 1.1. (xmllint only supports XSD 1.0)

I'm not sure but I guess <?xml version='1.0' encoding='UTF-8'?> is what you meant with your comment by not supporting XSD 1.1 and thus we use 1.0?

However, the file generated from sensor.sdf contains

<?xml version='1.0' encoding='UTF-8'?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.1" ...

but wouldn't it make more sense that the version number is the version of the SDF file (directory)? So that following command would set version to 1.8:

python3 tools/xmlschema.py -i sensor.sdf -o /ws/sdformat/test_output -s /ws/sdformat/sdf/1.8

Shouldn't this be "1.0"?

You are right; it should be 1.0 here.

but wouldn't it make more sense that the version number is the version of the SDF file (directory)?

No, because its references to different things. The first line (the XML header) specifies the version of the XML in which the schema file is written. The second line (the XSD namespace) specified the version of XSD to use when interpreting this schema file.

To specify the SDFormat version to use when validating one would (traditionally) set the version in the namespace (here is an example). I recall that there was an issue in this repo where this feature was requested before, and I am in favor of doing so. This would, however, need prior consent from one of the maintainers, and - because this is a new feature instead of a bugfix - I left it out of this PR.

The issue you mentioned might be #545 - something which should be fixed as all validation is done with the files on the website that might not represent the SDF version of your file.

Bi0T1N · 2022-03-12T22:28:26Z

tools/xmlschema.py

+xsd_schema = setup_schema(used_ns, use_default_ns=False)
+xsd_schema.append(element)
+with open(out_dir / (source.stem + ".xsd"), "wb") as out_file:
+    ElementTree.ElementTree(xsd_schema).write(out_file, encoding="UTF-8", xml_declaration=True)


The biggest problem I see is that the generated files are not readable by a human because the formatting is bad.
However, I also started to work on a Python version but gave it up after I found your PR because your approach is much better than mine.
To fix the problem with the unformatted output I used

from xml.dom import minidom with open(out_dir / (source.stem + "TypeF.xsd"), "w") as out_file: xmlstr = minidom.parseString(ElementTree.tostring(xsd_schema, xml_declaration=True, encoding='utf-8')).toprettyxml(indent=" ", encoding='utf-8').decode("utf-8") # replacing of < and > is needed for <![CDATA[text]]> but you don't use it #xmlstr = xmlstr.replace("<", "<") #xmlstr = xmlstr.replace(">", ">") # fixing text like 'The "ray", "gpu_ray", and "gps" types are...' after formatting xmlstr = xmlstr.replace(""", '"') out_file.write(xmlstr)

to format the XML of Python's xml.etree.ElementTree. As mentioned in the formatting part your code doesn't use <![CDATA[text]]> while the current files created with the Ruby script do.

I'm also not sure if <xs:documentation xml:lang="en"> is required instead of <xs:documentation> since the language will always be English?

The biggest problem I see is that the generated files are not readable by a human because the formatting is bad.

Yes, we can definitely make the files more pretty. Once this gets picked up by one of the maintainers, I will have a closer look into that.

As mentioned in the formatting part your code doesn't use while the current files created with the Ruby script do.

Yes, CDATA is one way of escaping the comments. It has been a while since I've written the initial code, but I remember looking into it and deciding that we can also not have it here ... I have forgotten the reason for this, though, and I could have been wrong.

I'm also not sure if <xs:documentation xml:lang="en"> is required instead of xs:documentation since the language will always be English?

Both options should work fine so I'm happy either way.

FirefoxMetzger · 2022-03-13T21:04:04Z

Thanks for the review @Bi0T1N . I won't get around to looking at this for another couple of days, because I have a deadline coming up and am tied up elsewhere.

That said, I have a more up-to-date version of this script that lives here. I never pushed the changes upstream, because it uses features from a more recent version of python (dataclasses), and schema generation currently doesn't seem to have high priority here (which is completely fine by me). You might also be interested in the generated schemata (currently up to v1.8).

chapulina · 2022-03-25T04:40:01Z

Hi @FirefoxMetzger , unfortunately we haven't had time to go over this PR yet. Edifice (sdf11) will EOL next week. How about retargeting this at Fortress (sdf12)?

Co-authored-by: Bi0T1N <[email protected]>

FirefoxMetzger · 2022-03-31T10:27:07Z

Hi @FirefoxMetzger , unfortunately we haven't had time to go over this PR yet. Edifice (sdf11) will EOL next week. How about retargeting this at Fortress (sdf12)?

@chapulina Sure that can be done; however, I don't feel very motivated to add more work to this PR just for it to keep sitting around waiting for a reviewer. Once someone from the team has the time to review and we can resolve the action items that need a decision for this PR I'm happy to pick it up again.

For reference these are the open points:

Should we use XSD 1.1 instead of XSD 1.0? Biggest pro: better representation of SDF that avoids false negatives (accepts invalid SDF) when validating (see: BUG: resolve XSD aliasing and parse missed elements #643 (comment) for details). Biggest Con: xmllint (which is currently used by the ign command line tool) doesn't support XSD 1.1
Does an elements required value take priority over a default value or vice versa? (see: Priority of required vs default #637 for details)

chapulina · 2022-03-31T14:31:00Z

Sure that can be done; however, I don't feel very motivated to add more work to this PR just for it to keep sitting around waiting for a reviewer.

That's very fair 👍🏽 We'll let you know when we can put some time into this. Thanks for the patience

Ryanf55 · 2024-04-20T16:43:28Z

Contributor

If we found:

a linter that could parse XSD 1.1 to replace XMLLint
It can be integrated with colcon test
It can be run from VSCode
Would that help the situation?

There is a recent-ish list here of options:
https://softwarerecs.stackexchange.com/questions/55262/xml-validator-using-xsd-1-1

azeey · 2024-04-22T15:20:54Z

Maybe. #1377 is also another direction--basically rename the tags that are causing problems.

azeey · 2024-08-19T17:00:14Z

#1455 and #535 are simpler approaches that will hopefully solve this problem and since there hasn't been much activity in this PR, I'll go ahead and close it.

FirefoxMetzger added 8 commits July 23, 2021 20:44

BUG: missing sdf + copy types.xsd

bc0eced

FEAT: python schema generator

22bba46

MAINT: remove (unnecessary) validators

2d4a457

MAINT: use etree for xml generation

4548b82

MAINT: remove refactored files

55396a8

MAINT: integrate recursion into class structure

302d66e

BUG: match time pattern with SDF default values

1334ed1

MAINT: refactor

2a86353

github-actions bot added the 🏢 edifice Ignition Edifice label Jul 28, 2021

FirefoxMetzger added 20 commits July 28, 2021 15:47

BUG: missing sdf + copy types.xsd

e36edce

FEAT: python schema generator

97cb10c

MAINT: remove (unnecessary) validators

688a9f7

MAINT: use etree for xml generation

d1798cb

MAINT: remove refactored files

b29e905

MAINT: integrate recursion into class structure

bbec0be

BUG: match time pattern with SDF default values

d856458

MAINT: refactor

887ef67

Merge branch 'xml_namespaces' of github.com:FirefoxMetzger/sdformat i…

de3a5b3

…nto xml_namespaces

MAINT: migrate v1.5 - v1.7 to new XSD parser

587d652

BUG: backport changes of types.xsd

38d5337

MAINT: move description into type schema

290da7b

MAINT: unwrap elements from choice block

a666c1e

The original script wrapped every element inside a choice block; however - as this is only a single element - the choice block has essentially no effect.

BUG: partially fix ignored min/maxOccurs

e19c366

MAINT: remove dataclasses

fa986c0

BUG: remove dataclass import

d0b8836

MAINT: refactor complex type generation

bdb2ab7

BUG: remove dataclass decorators

e5a7042

MAINT: absolute pathing for namespace

4d6cf32

MAINT: handle non-ASCII inside sdf files

b06c665

chapulina mentioned this pull request Mar 12, 2022

🎈 11.4.0 #879

Merged

7 tasks

Bi0T1N reviewed Mar 12, 2022

View reviewed changes

Bi0T1N mentioned this pull request Mar 17, 2022

Port embedSdf script from Ruby to Python3 and provide unittests #884

Merged

7 tasks

Apply suggestions from code review

cf1dad5

Co-authored-by: Bi0T1N <[email protected]>

FirefoxMetzger requested review from azeey and scpeters as code owners March 31, 2022 10:17

chapulina added the bug Something isn't working label Jul 29, 2022

Bi0T1N mentioned this pull request Feb 10, 2023

Python script to replace ruby xmlschema generator #1232

Merged

azeey removed their request for review April 7, 2023 19:55

azeey added beta Targeting beta release of upcoming collection and removed beta Targeting beta release of upcoming collection labels Jul 31, 2023

azeey mentioned this pull request Nov 6, 2023

Using SDF schema for VSCode SDF validation and intellisense #1326

Open

scpeters mentioned this pull request Mar 1, 2024

sdf 1.12: rename state subelements with _state suffix #1377

Closed

9 tasks

scpeters mentioned this pull request Jul 8, 2024

Add _state suffix to //state subelements #1455

Merged

8 tasks

azeey added the beta Targeting beta release of upcoming collection label Jul 29, 2024

azeey closed this Aug 19, 2024

azeey removed the beta Targeting beta release of upcoming collection label Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: resolve XSD aliasing and parse missed elements #643

BUG: resolve XSD aliasing and parse missed elements #643

FirefoxMetzger commented Jul 28, 2021 •

edited

Loading

azeey commented Nov 29, 2021

Bi0T1N commented Mar 12, 2022

Bi0T1N left a comment

Bi0T1N Mar 12, 2022

Bi0T1N Mar 12, 2022

FirefoxMetzger Mar 31, 2022

Bi0T1N Mar 12, 2022

FirefoxMetzger Mar 31, 2022

Bi0T1N Mar 12, 2022

Bi0T1N Mar 12, 2022

FirefoxMetzger Mar 31, 2022

Bi0T1N Apr 2, 2022

Bi0T1N Mar 12, 2022

Bi0T1N Mar 12, 2022

FirefoxMetzger Mar 31, 2022

Bi0T1N Apr 2, 2022

Bi0T1N Mar 12, 2022

Bi0T1N Mar 12, 2022

FirefoxMetzger Mar 31, 2022

FirefoxMetzger commented Mar 13, 2022

chapulina commented Mar 25, 2022

FirefoxMetzger commented Mar 31, 2022 •

edited

Loading

chapulina commented Mar 31, 2022

Ryanf55 commented Apr 20, 2024

azeey commented Apr 22, 2024

azeey commented Aug 19, 2024

		raise RuntimeError(f"Unknown simple type: {in_type}") from None


		# collect existing namespaces

BUG: resolve XSD aliasing and parse missed elements #643

BUG: resolve XSD aliasing and parse missed elements #643

Conversation

FirefoxMetzger commented Jul 28, 2021 • edited Loading

🦟 Bug fix

Summary

Checklist

azeey commented Nov 29, 2021

Bi0T1N commented Mar 12, 2022

Bi0T1N left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FirefoxMetzger commented Mar 13, 2022

chapulina commented Mar 25, 2022

FirefoxMetzger commented Mar 31, 2022 • edited Loading

chapulina commented Mar 31, 2022

Ryanf55 commented Apr 20, 2024

azeey commented Apr 22, 2024

azeey commented Aug 19, 2024

FirefoxMetzger commented Jul 28, 2021 •

edited

Loading

FirefoxMetzger commented Mar 31, 2022 •

edited

Loading