Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite GoogleTranslator.translate_batch method to require only one API call for the whole list of words (batch) #248

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tpaxman
Copy link

@tpaxman tpaxman commented Nov 15, 2023

I noticed that it's possible to take advantage of the fact that Google Translate can handle multiple words at a time to speed up the translate_batch method. In this case, if instead of using BaseTranslator._translate_batch for the GoogleTranslator.translate_batch method, we can instead join the batch list with linebreaks, run a single translate call on the entire string, and then split the translated string again by linebreaks to return the arr list of results. This way, we don't need to loop through multiple API calls via the translate method, rather can get the same set of results with only one API call.

I have tried a number of adhoc test cases by choosing random lists of strings and have found that the result is always identical to what you get when running the current method; however, before formalizing tests for this I wanted to submit it and see whether this idea has already been attempted and whether it was found lacking for some reason that I haven't noticed yet. Basically interested in feedback on the idea.

@nidhaloff
Copy link
Owner

Thanks for the PR. I think this is a good idea. However, would it work all the time? Wouldn't this make translations dependent on one other, like if its some kind of a sentence together?

I would say, make your implementation as default and add a parameter called "translate_separately" or "use_separate_api_calls"... whatever you wanna name it. This will make sure that we can have both functionalities if needed

@tpaxman
Copy link
Author

tpaxman commented Apr 27, 2024

Thanks for the feedback, @nidhaloff. As far as I could tell, it seemed like whenever a line break separates words, they don't get interpreted as a complete sentence and are treated separately. But that's just from the trials I've done and I guess Google could technically change that behaviour at some point. Is that what you were getting at though?

As for the parameter, I think that sounds like a good idea as well. Or if you wanted your implementation to remain as the default, then you could instead have a parameter like translate_quickly or whatever (i.e. the opposite of the ones you listed) but would be up to you of course!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants