-
-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔶 Add asyncIO feature for optimization of batch_translate #202
base: master
Are you sure you want to change the base?
Conversation
@nidhaloff , It would be great if you provide some of observations here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I really appreciate this work. This is a great feature to add.
I will take a deeper look, but for the moment I only noticed the additional dependencies and I would like to make it optional by default. So it would only be usable if the user wants to. In that case users should install the async version using:
pip install deep-translator[async]
This can be done using extra_requires. You can check the docx or pdf extras to see how this can be done
@@ -58,6 +58,7 @@ beautifulsoup4 = "^4.9.1" | |||
requests = "^2.23.0" | |||
docx2txt = {version = "^0.8", optional = true} | |||
pypdf = {version = "^3.3.0", optional = true} | |||
aiohttp = "^3.8.4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to make this optional, because otherwise this would introduce a new dependency and increase the size of the package. Users will not like this (myself included). Generally I'm against adding dependencies, but if its optional, then its ok, so IMO just make it optional and add an extra_requires: async
for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok adding to checklist for the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to not re-create an async_translate function. Instead maybe it would be better to add an async=False
parameter to the normal translate function and then basically reuse the translate function with async=True
inside your async_translate_batch function.
That way, you will not have to re-define another translate function, which will be 90% similar to the normal translate function and hence repeating yourself.
from pathlib import Path | ||
from typing import List, Optional, Union | ||
|
||
import aiohttp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need a try catch here when you make this depedency optional or maybe just add the import inside the function where this will be used. For an example, check out the docx or pypdf dependencies.
@@ -128,6 +132,23 @@ def translate(self, text: str, **kwargs) -> str: | |||
""" | |||
return NotImplemented("You need to implement the translate method!") | |||
|
|||
@abstractmethod | |||
@lru_cache(maxsize=128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better if you make the cache_maxsize value in a separate global config.py file
self, batch: List[str], **kwargs | ||
) -> List[str]: | ||
if not batch: | ||
raise Exception("Enter your text list that you want to translate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this custom exception to exceptions.py just to keep everything consistent? Something like a NotValidInputBatch exception
translation_tasks = [ | ||
self._async_translate(text, session) for text in batch | ||
] | ||
return await asyncio.gather(*translation_tasks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can get a ValueError here if translation_tasks is empty. You may want to add a check for that
async def _async_translate( | ||
self, text: str, session: aiohttp.ClientSession, **kwargs | ||
): | ||
if is_input_valid(text): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since most of the code in this function is the same as the non-async translate, maybe you can find a way to make some parts reusable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nidhaloff I'll try to keep all the things in mind and fix it. Thanks for you detailed explanation.
Description
At current version of
deep-translator(1.10.1)
batch_translate is doing a linear loop of same function of single translate function. So in that contextrequest
is creating new session every time for requesting url to translate. Which is causing translator slow. Also It's working synchronously as each time request is blocking so until that request is finished translator is waiting idle. It can be optimized with asyncio feature also adding a dependency ofaiohttp
package and further for duplicate request we can utilizelru_cache
system. Theres still long way to go for every translator to add the asyncio.Changes Made
Changes are going to be non-breaking. Maybe we could add different function for async versions. But new dependency should be added aside of request we need to add
aiohttp
for async http calls.async_translate_batch
for high level function (non-breaking change which adds functionality)_async_translate
as abstract method to translate a single text asynchronusly inbase.py
which should be implemented in every translatorChecklist:
aiohttp
dependency optionaldeepl
translator with testslibre
translator with testslinguee
translator with testsmicrosoft
translator with testsmymemory
translator with testspapago
translator with testspons
translator with testsqcri
translator with testsyandex
translator with testsProof Of Concept
Usage