Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: resolve XSD aliasing and parse missed elements #643

Closed
wants to merge 30 commits into from

Conversation

FirefoxMetzger
Copy link
Contributor

@FirefoxMetzger FirefoxMetzger commented Jul 28, 2021

🦟 Bug fix

Fixes #642 #637 #633 #632

Summary

This PR adds a new generator for XSD from the current SDFormat. It is currently set up to convert SDFormat v1.8, but can easily be backported to the other versions by updating the corresponding CMakeLists.

It fixes multiple issues with the old parser:

It also handles nested models and nested model states. This was apparently a blocking issue in previous attempts to refactor this script. I didn't encounter this problem directly; however, I had to guess the desired meaning of the ref=X attribute used in this case. I left a comment in the appropriate place.

On a high level, the generator is based around named types. It creates two files for each filename.sdf: one filename.xsd and one filenameType.xsd (the actual workhorse), e.g. model.xsd and modelType.xsd. FilenameType.xsd holds the complex type used by the root element of filename.sdf and filename.xsd is a shim to help parsers to make sense of partial sdf.

Checklist

  • Signed all commits for DCO
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge

@github-actions github-actions bot added the 🏢 edifice Ignition Edifice label Jul 28, 2021
@azeey
Copy link
Collaborator

azeey commented Nov 29, 2021

Unfortunately, we didn't have the resources to review this PR in October or November and it looks like we we may not be able to in the coming couple of months. My apologies for the delay, but I wanted to let you know it's still in our radar.

@chapulina chapulina mentioned this pull request Mar 12, 2022
7 tasks
@Bi0T1N
Copy link
Contributor

Bi0T1N commented Mar 12, 2022

@FirefoxMetzger
Can you please rebase this on the head of the current sdf branch? The changes for #642 have already been merged via #662 (with a proper <xsd:whiteSpace value="collapse"/>) but are also listed as changes here (without <xsd:whiteSpace value="collapse"/>).

Copy link
Contributor

@Bi0T1N Bi0T1N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also working on a Python script for the conversion and took the opportunity to review your code, as it already familiarized me with SDF -> XSD conversion, but your approach is more promising than my simple Ruby -> Python conversion. 😃

tools/xmlschema.py Show resolved Hide resolved
cmd_arg_parser.add_argument(
"-i",
"--in",
dest="source",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe sdf_file would be more suitable as variable name? In the case of source it is not clear whether it is a file, a path or a directory name.

tools/xmlschema.py Outdated Show resolved Hide resolved
raise RuntimeError(f"Unknown simple type: {in_type}") from None


# collect existing namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better put all declarations above and the instruction code into a

if __name__ == "__main__":
  cmd_arg_parser = argparse.ArgumentParser(
      description="Create an XML schema from a SDFormat file."
  )
  ...
  xsd_schema = setup_schema(used_ns, use_default_ns=False)
  xsd_schema.append(element)
  with open(out_dir / (source.stem + ".xsd"), "wb") as out_file:
      ElementTree.ElementTree(xsd_schema).write(out_file, encoding="UTF-8", xml_declaration=True)

as this will increase the readability (and allows to reuse your classes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bi0T1N Hmm, I think this is mainly a stylistic choice.

The if __name__ == "__main__" guard is useful if your have double-barreled files that - in parallel - function as both a script and as a module. Here, the purpose is to be a script only (there is no intention towards reusing the classes/functions elsewhere), so I chose not to use the guard. In my experience, that leads to cleaner scripts, because you can write (and debug) them sequentially rather than having to jump around the file, because classes live at the top and execution logic at the bottom.

tools/xmlschema.py Show resolved Hide resolved
el.set("type", f"{other_file.stem}:{name}Type")

# TODO: remove this line
declared.setdefault("imports", set()).add(other_file.stem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it? declared variable isn't used. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be the variable in which the imported namespaces were collected, but that changed with a refactor. This comment is mainly a reminder to clean up once there is general approval on the approach this script takes and we are moving towards an actual merge.

"*": ("0", "unbounded"),
"-1": ("0", "0"),
}
min_occurs, max_occurs = required_codes[self.element.attrib["required"]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already shows up above so make it a function to reduce code duplication

else:
# XSD 1.1 allows maxOccurs="unbound" within xs:all, so
# we could drop this entire else block if we could upgrade
# to XSD 1.1. (xmllint only supports XSD 1.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find an issue for it on GitLab, just an older issue in the closed bug tracker. So maybe report it and hope for the best?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an issue with XSD but rather an issue with sdformat. XSD v1.0 (the schema version used here) is limited in that it doesn't allow us to represent the true capabilities of SDF.

In SDF, child elements may appear 0, 1, or infinitely many times depending on the element and - this is what kills XSD 1.0 - may do so in any order. In XSD you may choose exactly one element from a set of elements (xs:choice), you may specify a block in which elements may occur in any order and with a specified (and finite) min/max count for each element (xs:all), or you may specify a block in which elements must occur in the given order but may do so the given (potentially infinite) number of times. Neither of these choices fits.

The way the ruby generator solved this is by aggregating all valid children into a xs:choice element which may repeat 0 to inf many times. This works for elements with infinite counts but doesn't enforce maxOccurs for elements that may only occur once. If we, thus, validate SDF files with the resulting schema, it may produce false negatives by saying that a SDF is valid even though a unique element occurs twice.

The way the commented out code solves this is by grouping elements into a group with finite count (<=1) and a group with infinite count. It then inserts the xs:choice block discussed above (for infinite-count elements only) between every two fine-count elements. It then does this for every permutation of finite-count elements (because in SDF elements may occur in any order, but we need to validate that unique elements don't reoccur). This will result in XSD that doesn't produce above false negatives, but it comes at the cost of massive schemata (GBs in magnitude).

So for now I just do the same here as the ruby script (accept that the result produces false negatives), because we can't change how SDF works, and we also can't bump the XSD version without prior feedback from one of the maintainers.

In XSD 1.1 this is less of a problem because above mentioned xs:all block allows elements with infinite counts. This means that we can simply use an xs:all block instead of a xs:choice block and be happy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation. 👍
My comment was just about xmllint only supports XSD 1.0 which is probably a reason why we can't simply use XSD 1.1 as there is no tool for linting. An option might be to only use XSD 1.1 if there is no other choice (= you need XSD 1.1 to represent it properly) and otherwise stick with XSD 1.0 - I don't know if mixing the two is even supported.


def setup_schema(used_ns: list, use_default_ns: bool = True) -> ElementTree.Element:
xsd_schema = ElementTree.Element(_to_qname("xs:schema"))
xsd_schema.set("version", "1.1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be "1.0"? From above:

# XSD 1.1 allows maxOccurs="unbound" within xs:all, so 
# we could drop this entire else block if we could upgrade
# to XSD 1.1. (xmllint only supports XSD 1.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but I guess <?xml version='1.0' encoding='UTF-8'?> is what you meant with your comment by not supporting XSD 1.1 and thus we use 1.0?

However, the file generated from sensor.sdf contains

<?xml version='1.0' encoding='UTF-8'?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.1" ...

but wouldn't it make more sense that the version number is the version of the SDF file (directory)? So that following command would set version to 1.8:

python3 tools/xmlschema.py -i sensor.sdf -o /ws/sdformat/test_output -s /ws/sdformat/sdf/1.8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be "1.0"?

You are right; it should be 1.0 here.

but wouldn't it make more sense that the version number is the version of the SDF file (directory)?

No, because its references to different things. The first line (the XML header) specifies the version of the XML in which the schema file is written. The second line (the XSD namespace) specified the version of XSD to use when interpreting this schema file.

To specify the SDFormat version to use when validating one would (traditionally) set the version in the namespace (here is an example). I recall that there was an issue in this repo where this feature was requested before, and I am in favor of doing so. This would, however, need prior consent from one of the maintainers, and - because this is a new feature instead of a bugfix - I left it out of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue you mentioned might be #545 - something which should be fixed as all validation is done with the files on the website that might not represent the SDF version of your file.

xsd_schema = setup_schema(used_ns, use_default_ns=False)
xsd_schema.append(element)
with open(out_dir / (source.stem + ".xsd"), "wb") as out_file:
ElementTree.ElementTree(xsd_schema).write(out_file, encoding="UTF-8", xml_declaration=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest problem I see is that the generated files are not readable by a human because the formatting is bad.
However, I also started to work on a Python version but gave it up after I found your PR because your approach is much better than mine.
To fix the problem with the unformatted output I used

from xml.dom import minidom

with open(out_dir / (source.stem + "TypeF.xsd"), "w") as out_file:
    xmlstr = minidom.parseString(ElementTree.tostring(xsd_schema, xml_declaration=True, encoding='utf-8')).toprettyxml(indent="  ", encoding='utf-8').decode("utf-8")
    # replacing of < and > is needed for <![CDATA[text]]> but you don't use it
    #xmlstr = xmlstr.replace("&lt;", "<")
    #xmlstr = xmlstr.replace("&gt;", ">")
    # fixing text like 'The "ray", "gpu_ray", and "gps" types are...' after formatting
    xmlstr = xmlstr.replace("&quot;", '"')
    out_file.write(xmlstr)

to format the XML of Python's xml.etree.ElementTree. As mentioned in the formatting part your code doesn't use <![CDATA[text]]> while the current files created with the Ruby script do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure if <xs:documentation xml:lang="en"> is required instead of <xs:documentation> since the language will always be English?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest problem I see is that the generated files are not readable by a human because the formatting is bad.

Yes, we can definitely make the files more pretty. Once this gets picked up by one of the maintainers, I will have a closer look into that.

As mentioned in the formatting part your code doesn't use while the current files created with the Ruby script do.

Yes, CDATA is one way of escaping the comments. It has been a while since I've written the initial code, but I remember looking into it and deciding that we can also not have it here ... I have forgotten the reason for this, though, and I could have been wrong.

I'm also not sure if <xs:documentation xml:lang="en"> is required instead of xs:documentation since the language will always be English?

Both options should work fine so I'm happy either way.

@FirefoxMetzger
Copy link
Contributor Author

Thanks for the review @Bi0T1N . I won't get around to looking at this for another couple of days, because I have a deadline coming up and am tied up elsewhere.

That said, I have a more up-to-date version of this script that lives here. I never pushed the changes upstream, because it uses features from a more recent version of python (dataclasses), and schema generation currently doesn't seem to have high priority here (which is completely fine by me). You might also be interested in the generated schemata (currently up to v1.8).

@chapulina
Copy link
Contributor

Hi @FirefoxMetzger , unfortunately we haven't had time to go over this PR yet. Edifice (sdf11) will EOL next week. How about retargeting this at Fortress (sdf12)?

@FirefoxMetzger
Copy link
Contributor Author

FirefoxMetzger commented Mar 31, 2022

Hi @FirefoxMetzger , unfortunately we haven't had time to go over this PR yet. Edifice (sdf11) will EOL next week. How about retargeting this at Fortress (sdf12)?

@chapulina Sure that can be done; however, I don't feel very motivated to add more work to this PR just for it to keep sitting around waiting for a reviewer. Once someone from the team has the time to review and we can resolve the action items that need a decision for this PR I'm happy to pick it up again.

For reference these are the open points:

@chapulina
Copy link
Contributor

Sure that can be done; however, I don't feel very motivated to add more work to this PR just for it to keep sitting around waiting for a reviewer.

That's very fair 👍🏽 We'll let you know when we can put some time into this. Thanks for the patience

@chapulina chapulina added the bug Something isn't working label Jul 29, 2022
@azeey azeey removed their request for review April 7, 2023 19:55
@azeey azeey added beta Targeting beta release of upcoming collection and removed beta Targeting beta release of upcoming collection labels Jul 31, 2023
@Ryanf55
Copy link

Ryanf55 commented Apr 20, 2024

Contributor

If we found:

  • a linter that could parse XSD 1.1 to replace XMLLint
  • It can be integrated with colcon test
  • It can be run from VSCode
    Would that help the situation?

There is a recent-ish list here of options:
https://softwarerecs.stackexchange.com/questions/55262/xml-validator-using-xsd-1-1

@azeey
Copy link
Collaborator

azeey commented Apr 22, 2024

Maybe. #1377 is also another direction--basically rename the tags that are causing problems.

@azeey azeey added the beta Targeting beta release of upcoming collection label Jul 29, 2024
@azeey
Copy link
Collaborator

azeey commented Aug 19, 2024

#1455 and #535 are simpler approaches that will hopefully solve this problem and since there hasn't been much activity in this PR, I'll go ahead and close it.

@azeey azeey closed this Aug 19, 2024
@azeey azeey removed the beta Targeting beta release of upcoming collection label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working 🏢 edifice Ignition Edifice
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Invalid default value for elements of type <time>
5 participants