Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] lxml tries to resolve remote id's, but fails #56

Open
pvgenuchten opened this issue Sep 12, 2024 · 3 comments
Open

[python] lxml tries to resolve remote id's, but fails #56

pvgenuchten opened this issue Sep 12, 2024 · 3 comments
Labels
implementation-challenges Discussion of XSLT implementation challenges

Comments

@pvgenuchten
Copy link

this is probably not an issue related to the xslt directly, just wat to share some expirence when running the xslt in a python environment, maybe one of you knows a solution for this.

when i run the xslt in python for this record I get an error related to failure of loading this external entity

The remote link is mentioned in:

<srv:operatesOn uuidref="r_basili:-7bda2a44:134ec5768c4:-4f32" xlink:href="http://rsdi.regione.basilicata.it/Catalogo/srv/ita/csw?request=GetRecordById&service=CSW&version=2.0.2&elementSetName=full&OUTPUTSCHEMA=http://www.isotc211.org/2005/gmd&id=r_basili:-7bda2a44:134ec5768c4:-4f32"/>

I have been looking at the option of not loading external entities by lxml, but i was not successfull, any ideas?

@pvgenuchten
Copy link
Author

pvgenuchten commented Sep 13, 2024

I was able to optimize this from lxml docs by creating a custom resolver.

define a custom resolve class

import lxml.etree as ET

class LinkResolver(ET.Resolver):
    def resolve(self, url, id, context):
        print("Resolving URL '%s'" % url)
        return self.resolve_string(
            '<!ENTITY myentity "[resolved text: %s]">' % url, context)

Then use the resolver in xslt parsing

iso_parser = ET.XMLParser(ns_clean=True, recover=True, encoding='utf-8')
iso_parser.resolvers.add( LinkResolver() )
xsl = ET.fromstring(open('iso-triplify/iso-19139-to-dcat-ap.xsl', "r").read().encode('utf-8'), parser=iso_parser)
transform = ET.XSLT(xsl)
rdfxml = ET.tostring(transform(xml), pretty_print=True)

Welcoming suggestions for improvement :-)

@jakubklimek jakubklimek added the implementation-challenges Discussion of XSLT implementation challenges label Sep 16, 2024
@NielsHoffmann
Copy link

Hi Paul,
As I understand it, lxml only supports XSLT 1.0. So I would expect you'd run into other issues with lxml and the iso-19139-to-dcat-ap.xsl as well because it now uses XSLT 2.0 constructs?

@pvgenuchten
Copy link
Author

Actually we run the xslt via lxml with adjustments in production at https://github.com/soilwise-he/harvesters/blob/main/iso-triplify/iso-19139-to-dcat-ap.xsl, haven’t run into big issues with xslt2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
implementation-challenges Discussion of XSLT implementation challenges
Projects
None yet
Development

No branches or pull requests

3 participants