-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Foreign Languages #496
Comments
I'm not sure support for foreign languages is implemented, though I'd think the foreign words could just be added to the spam detection word list 🤔 |
I feel like making a folder with filter lists and then checking the spam words through there would be way better than translating messages as sometimes google translate can change the original message's meaning |
@EthanHindmarsh Created a PR with foreign language support. Review/testing appreciated especially because I will need to spin up a Windows VM to test properly |
@UnknownCrafts How many filter lists can you realistically keep? It is my understanding that bots cycle through thousands of replies combinations, find one that works, and then propagate that one. We can't keep a dictionary of every possible language combo. Are you worried about false positives spiking? Also, side note: If the bots are truly automated then they would be using Google Translate, to begin with, right? Because they are trying to drive traffic to their site and Youtube's translate feature probably used Google Translate. That way when English speakers, the majority of Youtubes audience, click the translate button it gives the best English translation |
my only worry was false positives rising but I understand that we can't just keep on adding filter lists. I guess google translate is a good option but again I worry that false positives might rise because of it. |
A decent idea would be to use google translate to detect the language and compare against the spam list for that language |
Sending a bunch of requests to the Google Translate API for every message would not be a great solution though 🤔 Would definitely slow down the process greatly |
Google translate has quotas as well. We shouldn't be forcing users to balance so many quotas. |
@ashvinnihalani Are there timestamps? Also, are there cases of specific foreign languages getting through or cases of non-ASCII text getting through? I don't think translation is a good idea because (in addition to it being resource-intensive) the methods that spammers might use to evade spam filters in English are not necessarily the same in other languages. |
@ashvinnihalani Also, this is more of a discussion then a issue. @ThioJoe Please move this to the discussions page with the ideas tag, thanks. |
Yeah, ThioJoe can add a couple of scam words either in his spam-lists repo or even directory into the python script, words like: in other languages. |
Yes and Google Translate is also not the best for transalating stuff for certain languages, if you get what I mean. |
@ThioJoe I can do this in your YT-Spam-Lists repo, if required. |
So a couple of follow up comments:
With all of these improvements I think that the slow down will be negligible. Thoughts @KendallDoesCoding @ThioJoe @UnknownCrafts. |
It really wouldn't make a difference because very little of the filter even looks for whole words. I'm just going to close this because tbh I don't really intend to implement any kind of translation functionality. If there is a pattern of a certain type of spammer in another language I'll take a look and see what I can do. But I'd need actual specific examples |
fair, but ig there should be a comment somewhere in README saying this only works for english spam comments |
I don't believe that the application handles Foreign Languages very well
If you look at the recent Linus Tech Tips Video. There are several instances of foreign language spam comments getting through.
A simple solution would be to use the googletrans python package to translate the comment text before running the filter.
The text was updated successfully, but these errors were encountered: