Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: invalid literal for int() with base 10 #61

Open
dege13 opened this issue May 2, 2024 · 6 comments
Open

ValueError: invalid literal for int() with base 10 #61

dege13 opened this issue May 2, 2024 · 6 comments

Comments

@dege13
Copy link

dege13 commented May 2, 2024

I keep hitting this error with several books. Any ideas what is wrong?

Using the command python main.py --tts edge
Running on Windows.

2024-05-01 23:14:24 [INFO] Converting chapter 7/22: THE_LIKEOMETER, characters: 48686
2024-05-01 23:14:54 [INFO] Converting chapter 8/22: ULTRASOCIALITY, characters: 30998
Traceback (most recent call last):
  File "C:\Users\dege1\Documents\GitHub\epub_to_audiobook\main.py", line 139, in <module>
    main()
  File "C:\Users\dege1\Documents\GitHub\epub_to_audiobook\main.py", line 135, in main
    AudiobookGenerator(config).run()
  File "C:\Users\dege1\Documents\GitHub\epub_to_audiobook\audiobook_generator\core\audiobook_generator.py", line 101, in run
    tts_provider.text_to_speech(
  File "C:\Users\dege1\Documents\GitHub\epub_to_audiobook\audiobook_generator\tts_providers\edge_tts_provider.py", line 158, in text_to_speech
    asyncio.run(
  File "C:\Users\dege1\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\dege1\AppData\Local\Programs\Python\Python311\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dege1\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\dege1\Documents\GitHub\epub_to_audiobook\audiobook_generator\tts_providers\edge_tts_provider.py", line 101, in save
    await self.chunkify()
  File "C:\Users\dege1\Documents\GitHub\epub_to_audiobook\audiobook_generator\tts_providers\edge_tts_provider.py", line 65, in chunkify
    for pause_time, content in self.parsed:
  File "C:\Users\dege1\Documents\GitHub\epub_to_audiobook\audiobook_generator\tts_providers\edge_tts_provider.py", line 58, in parse_text
    yield int(pause_time), content.strip()
          ^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '3 Reciprocity with a Vengeance Zigong asked: “Is there any single word that could guide one’s entire life?” The master said: “Should it not be reciprocity? What you do not wish for yourself, do not d

I also have these warnings before the processing happens, but some books do succeed so I don't think these warnings should actually matter.

C:\Users\dege1\Documents\GitHub\epub_to_audiobook\venv\Lib\site-packages\ebooklib\epub.py:1395: UserWarning: In the future version we will turn default option ignore_ncx to True.
  warnings.warn('In the future version we will turn default option ignore_ncx to True.')
C:\Users\dege1\Documents\GitHub\epub_to_audiobook\venv\Lib\site-packages\ebooklib\epub.py:1423: FutureWarning: This search incorrectly ignores the root element, and will be fixed in a future version.  If you rely on the current behaviour, change it to './/xmlns:rootfile[@media-type]'
  for root_file in tree.findall('//xmlns:rootfile[@media-type]', namespaces={'xmlns': NAMESPACES['CONTAINERNS']}):
C:\Users\dege1\Documents\GitHub\epub_to_audiobook\audiobook_generator\tts_providers\base_tts_provider.py:14: RuntimeWarning: coroutine 'EdgeTTSProvider.validate_config' was never awaited
  self.validate_config()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
C:\Users\dege1\Documents\GitHub\epub_to_audiobook\venv\Lib\site-packages\bs4\builder\__init__.py:545: XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an HTML parser. If this really is an HTML document (maybe it's XHTML?), you can ignore or filter this warning. If it's XML, you should know that using an XML parser will be more reliable. To parse this document as XML, make sure you have the lxml package installed, and pass the keyword argument `features="xml"` into the BeautifulSoup constructor.
  warnings.warn(
@dege13
Copy link
Author

dege13 commented May 2, 2024

I found an copyright free example epub that causes the error. Feel free to use this one to reproduce the issue, or if it is environmental, maybe you will not get the error.
famouspaintings.zip
Found another free sample epub which also demonstrates the issue:
internallinks sample.zip

@LarsHLunde
Copy link

I managed to fix it.
In the file:
epub_to_audiobook/audiobook_generator/tts_providers/edge_tts_provider.py

I added a try except block around the int parser, and now it doesn't crash:

    def parse_text(self):
        if not "[pause:" in self.text:
            return [(0, self.text)]

        parts = self.text.split("[pause:")
        for part in parts:
            if "]" in part:
                pause_time, content = part.split("]", 1)
                try:
                    yield int(pause_time), content.strip()
                except ValueError:
                    yield 0, content.strip()

            else:
                content = part
                yield 0, content.strip()

@igrowheart
Copy link

Thank you @LarsHLunde, haven't got chance to test dege13's doc. However, it worked perfectly on my books, which are failing before the fix.

@2600box
Copy link

2600box commented May 31, 2024

Hello, I ran into this problem with edge on Windows and MacOS, but it was not an issue when using docker. I did some investigating which may be interesting or useful, before finding this issue with your solution.

The docker uses python 3.11 so I thought that might be it, but forcing python 3.11 on MacOS still resulted in the same error.

The contents of edge_tts_provider.py is identical in the docker.

Here are the different modules, where docker is using the older version, so probably one of these is to blame.

< ------------------ ---------
< aiohttp            3.9.1
---
> ------------------ --------
> aiohttp            3.9.5
5c5
< annotated-types    0.6.0
---
> annotated-types    0.7.0
9c9
< certifi            2023.7.22
---
> certifi            2024.2.2
13c13
< edge-tts           6.1.9
---
> edge-tts           6.1.10
16,20c16,20
< httpcore           1.0.2
< httpx              0.26.0
< idna               3.6
< lxml               5.1.0
< multidict          6.0.4
---
> httpcore           1.0.5
> httpx              0.27.0
> idna               3.7
> lxml               5.2.2
> multidict          6.0.5
23,25c23,25
< pip                23.2.1
< pydantic           2.5.3
< pydantic_core      2.14.6
---
> pip                24.0
> pydantic           2.7.2
> pydantic_core      2.18.3
27c27
< setuptools         65.5.1
---
> setuptools         69.5.1
29c29
< sniffio            1.3.0
---
> sniffio            1.3.1
32,35c32,35
< tqdm               4.66.1
< typing_extensions  4.9.0
< urllib3            2.1.0
< wheel              0.42.0
---
> tqdm               4.66.4
> typing_extensions  4.12.0
> urllib3            2.2.1
> wheel              0.43.0

@jfhc
Copy link

jfhc commented Jun 16, 2024

The fix from @LarsHLunde works for me. I'm guessing it breaks when the text itself has a ] in it maybe? Something like this could be a fix? (not proper code just expressing the idea)

    match = re.search(r'\[pause: ([^\]]+)\]', self.text)
    if match:
        x_part = match.group(1)
        rest_of_string = s[match.end():]  
        # recursively apply this to rest_of_string 
    else:
        yield 0, s

Also I'm confused because the comment on the code says

    # This class uses edge_tts to generate text
    # but with pauses for example:- text: 'Hello
    # this is simple text. [pause: 2s] Paused 2s'

But that example would throw an error when it tries to do int('2s'). So maybe I am misunderstanding the code somewhere.

@p0n1 p0n1 mentioned this issue Jun 27, 2024
@p0n1
Copy link
Owner

p0n1 commented Jun 28, 2024

Just made a new release to address issues related with edge tts feature. https://github.com/p0n1/epub_to_audiobook/releases/tag/v0.6.0
It would be great if anyone in this issue to try it out and let me know if it's working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants