Skip to content

Commit

Permalink
Simple post-load validation hook
Browse files Browse the repository at this point in the history
  • Loading branch information
tobywf committed Feb 18, 2020
1 parent 65fdc96 commit eedea9c
Show file tree
Hide file tree
Showing 4 changed files with 52 additions and 3 deletions.
24 changes: 22 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# XML dataclasses

This is a very rough prototype of how a library might look like for (de)serialising XML into Python dataclasses. XML dataclasses build on normal dataclasses from the standard library and [`lxml`](https://pypi.org/project/lxml/) elements. Loading and saving these elements is left to the consumer for flexibility of the desired output.
[![License: MPL 2.0](https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg)](https://opensource.org/licenses/MPL-2.0)

This is a prototype of how a library might look like for (de)serialising XML into Python dataclasses. XML dataclasses build on normal dataclasses from the standard library and [`lxml`](https://pypi.org/project/lxml/) elements. Loading and saving these elements is left to the consumer for flexibility of the desired output.

It isn't ready for production if you aren't willing to do your own evaluation/quality assurance. I don't recommend using this library with untrusted content. It inherits all of `lxml`'s flaws with regards to XML attacks, and recursively resolves data structures. Because deserialisation is driven from the dataclass definitions, it shouldn't be possible to execute arbitrary Python code. But denial of service attacks would very likely be feasible.

Expand Down Expand Up @@ -46,10 +48,16 @@ class Container:
rootfiles: RootFiles
# WARNING: this is an incomplete implementation of an OPF container

def xml_validate(self):
if self.version != "1.0":
raise ValueError(f"Unknown container version '{self.version}'")


if __name__ == "__main__":
nsmap = {None: CONTAINER_NS}
lxml_el_in = etree.parse("container.xml").getroot()
# see Gotchas, stripping whitespace is highly recommended
parser = etree.XMLParser(remove_blank_text=True)
lxml_el_in = etree.parse("container.xml", parser).getroot()
container = load(Container, lxml_el_in, "container")
lxml_el_out = dump(container, "container", nsmap)
print(etree.tostring(lxml_el_out, encoding="unicode", pretty_print=True))
Expand All @@ -64,6 +72,7 @@ if __name__ == "__main__":
* Lists of child elements are supported, as are unions and lists or unions
* Inheritance does work, but has the same limitations as dataclasses. Inheriting from base classes with required fields and declaring optional fields doesn't work due to field order. This isn't recommended
* Namespace support is decent as long as correctly declared. I've tried on several real-world examples, although they were known to be valid. `lxml` does a great job at expanding namespace information when loading and simplifying it when saving
* Post-load validation hook `xml_validate`

## Patterns

Expand Down Expand Up @@ -117,6 +126,17 @@ Children can be renamed via the `rename` function, however attempting to set a n

If a class has children, it cannot have text content.

### Defining post-load validation

Simply implement an instance method called `xml_validate` with no parameters, and no return value (if you're using type hints):

```python
def xml_validate(self) -> None:
pass
```

If defined, the `load` function will call it after all values have been loaded and assigned to the XML dataclass. You can validate the fields you want inside this method. Return values are ignored; instead raise and catch exceptions.

## Gotchas

### Whitespace
Expand Down
5 changes: 4 additions & 1 deletion functional/container_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,10 @@ class Container:
version: str
rootfiles: RootFiles
# WARNING: this is an incomplete implementation of an OPF container
# (it's missing links)

def xml_validate(self):
if self.version != "1.0":
raise ValueError(f"Unknown container version '{self.version}'")


@pytest.mark.parametrize("remove_blank_text", [True, False])
Expand Down
8 changes: 8 additions & 0 deletions src/xml_dataclasses/serde.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,14 @@ def load(cls: Type[XmlDataclass], el: Any, name: Optional[str] = None) -> XmlDat

instance = cls(**attr_values, **text_values, **child_values) # type: ignore
instance.__nsmap__ = el.nsmap

try:
validate_fn = instance.xml_validate # type: ignore
except AttributeError:
pass
else:
validate_fn()

return instance


Expand Down
18 changes: 18 additions & 0 deletions tests/load_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -530,3 +530,21 @@ class Foo:
foo = load(Foo, el, "foo")
assert isinstance(foo.bar, Child1)
assert foo.bar.spam == "eggs"


def test_load_with_validation():
class MyError(Exception):
pass

@xml_dataclass
class Foo:
__ns__ = None
bar: str

def xml_validate(self) -> None:
if self.bar == "baz":
raise MyError()

el = etree.fromstring('<foo bar="baz" />')
with pytest.raises(MyError):
load(Foo, el, "foo")

0 comments on commit eedea9c

Please sign in to comment.