EpubReader throws Exception if OPF's <metadata> tag contains XML comments #105

wendong-li · 2016-08-22T08:39:42Z

Just wanted to share some information about an error I faced when parsing the metadata of an epub file using EpubReader.

I was using ebooklib v0.15. When the error happens, the library throws the following exception:

'cython_function_or_method' object has no attribute 'rfind'

Checked the source code, and confirmed that it comes from the following code:

class EpubReader(object):
    ...
    def _load_metadata(self):
        ...
        for t in metadata:
            if not etree.iselement(t):
                continue
            if t.tag == default_ns + 'meta':
                ...
            else:
                tag = t.tag[t.tag.rfind('}') + 1:]    // From here

The reason is that etree.iselement(t) seems to treat both comments and tags as elements, e.g. it returns true even if t is an XML comment.

I was using python 2.7.10 on Mac 10.11 with lxml v3.6.2.

I checked the issue history here but seems no one else met this issue before, is it only me because of my environment?

It seems to me that this is an issue of the lxml library. And I tried to upgrade my lxml to v3.6.3 but failed to do that (couldn't fix some error during the installation).

In case it's a common issue, it would be better that ebooklib handles it internally. That's why I created this ticket.

Thanks,
Wendong

The text was updated successfully, but these errors were encountered:

benjhastings · 2017-02-21T15:28:47Z

I just hit the same issue. Really the best way to fix this would be to modify ebooklib.utils#parse_string so that it accepts some kwargs. Then the user can specify a parser if they require one.

I ended up patching my local install of ebooklib so that it used my version of the parse_string function that ebooklib calls in the _load_container() method.

My modified "parse_string" method is as follows:

def parse_string(s):
    parser = etree.XMLParser(remove_comments=True)
    try:
        tree = etree.parse(io.BytesIO(s.encode('utf-8')), parser=parser)
    except:
        tree = etree.parse(io.BytesIO(s), parser=parser)
    return tree

I then just patched my local copy so that instead of importing parse_string from ebooklib.utils I imported it from mymodule.utils

benjhastings · 2017-02-22T09:02:13Z

I just added a pull request in (#115) which should resolve this issue.

aerkalov · 2017-03-09T15:26:49Z

It seems that this issue has been fixed on 2014-10-17 by Chris Grice and the fix is in the master branch but there has been no release since then. Last release was on 2014-05-15. Not so nice from my side. What I will do is fix couple of smaller issues and do the release ASAP.

aerkalov · 2017-07-15T10:24:53Z

There is a 0.16 release now which fixes this issue.

wendong-li · 2018-03-06T09:39:25Z

Thanks @aerkalov. Confirmed it's not reproduced any more in v0.16.

aerkalov closed this as completed Jul 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EpubReader throws Exception if OPF's <metadata> tag contains XML comments #105

EpubReader throws Exception if OPF's <metadata> tag contains XML comments #105

wendong-li commented Aug 22, 2016 •

edited

Loading

benjhastings commented Feb 21, 2017

benjhastings commented Feb 22, 2017

aerkalov commented Mar 9, 2017

aerkalov commented Jul 15, 2017

wendong-li commented Mar 6, 2018

EpubReader throws Exception if OPF's <metadata> tag contains XML comments #105

EpubReader throws Exception if OPF's <metadata> tag contains XML comments #105

Comments

wendong-li commented Aug 22, 2016 • edited Loading

benjhastings commented Feb 21, 2017

benjhastings commented Feb 22, 2017

aerkalov commented Mar 9, 2017

aerkalov commented Jul 15, 2017

wendong-li commented Mar 6, 2018

wendong-li commented Aug 22, 2016 •

edited

Loading