This repository has been archived by the owner on Aug 21, 2024. It is now read-only.
Update email regex to be more liberal to check TLDs #10973
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Previously,
EMAIL_REGEX
has a list of top level domains, that are most commonly used by people. That regex was picked from here: https://fightingforalostcause.net/content/misc/2006/compare-email-regex.phpBut there are actually lot of other top level domains in existence, and it's just not viable to put all of them into regex. Here's the list of all allowed top level domains: https://data.iana.org/TLD/tlds-alpha-by-domain.txt. There are 1447 top level domains registered till now.
Now, I'm updating regex to allow top level domains which follow this criteria:
.com
,.io
. But to allow maximum TLDs, while still not allowing obscure TLDs, this figure is chosen)Out of the 1447 TLDs, 1305 (90%) TLDs follow this criteria.
New pattern is liberal than previous pattern. So it can have false positives. But I guess false positives are better than false negatives in case of email validation.
Subtasks Checklist
Breaking Changes
References
closes #insert number here
QA Steps