Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm sorry, but there's loads of issues... #14

Open
RobThree opened this issue Nov 29, 2016 · 9 comments
Open

I'm sorry, but there's loads of issues... #14

RobThree opened this issue Nov 29, 2016 · 9 comments

Comments

@RobThree
Copy link

HTML Tags
/^<([a-z1-6]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

Famous answer

Hex Value
/^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$/

Hex HTML/CSS color value maybe, but 0xDEADBEAF is a perfectly valid hex value.

Password
/^[a-zA-Z0-9+_-]{6,32}$/

Slowly we're moving the world to password phrases and everybody should be hashing their passwords. Then why the 32 char limit? And why, for Pete's sake, are we only allowing a-zA-Z0-9+_- and nothing else? *cries* (see also)

Email
/^([a-z0-9+_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,24})$/

Yeah. Just. No. Another famous answer

Positive number
/^\d*\.?\d+$/

We don't all live in the US/UK. (1,234.56 v.s. 1.234,56)

Phonenumber
/^\+?[\d\s]{3,}$/

+123 is a valid phonenumber? Where? Phonenumbers are notoriously hard to validate (hence libphonenumber for example).

Date in format dd/mm/yyyy
/^(0?[1-9]|[12][0-9]|3[01])([ \/\-])(0?[1-9]|1[012])\2(19[0-9][0-9]|20[0-9][0-9])$/

Failed the very first 'edge case' I could come up with: 30/02/2016 but also 1852 or 2150 fail... ( as noted elsewhere).

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski

@lukehaas
Copy link
Owner

Thanks @RobThree some valid points there.
Though the date pattern is matching on 30/02/2016 for me.

Regarding the HTML tag pattern, that's pretty useful for plain text HTML, like in an editor.

I've now removed the password pattern as that was proving particularly controversial.

PRs are very welcome if you want to make any improvements yourself.

@dwelle
Copy link

dwelle commented Nov 29, 2016

Regarding the HTML tag pattern, that's pretty useful for plain text HTML, like in an editor.

<b>this html</b><b>would beg the differ</b>

@RobThree
Copy link
Author

RobThree commented Nov 29, 2016

Though the date pattern is matching on 30/02/2016 for me

Except that feb. 30th doesn't exist 😉

Regarding the HTML tag pattern, that's pretty useful for plain text HTML, like in an editor.

Except that there are a gazillion ways the regex will match incorrectly (demonstrated here) or cause trouble otherwise. Have you read the stackoverflow answer I linked to?

PRs are very welcome if you want to make any improvements yourself.

All the ones I pointed out are very case-specific and hard, if not impossible (html, email for example), to get correct. Though I can think of improvements here-and-there I'd suggest taking them all down; for most, if not all, of the regexes there are better ways of handling and validating the inputs (like simply parsing a date(time) value to 'validate' it or sending an activation e-mail to verify an e-mail address).

Regexes do have their use, I'm not saying they don't. But, as said, for most (if not all) of the examples there are much better solutions.


Edit: Here's more I just stumbled upon.

@CSobol
Copy link
Contributor

CSobol commented Nov 29, 2016

Re: Emails. The only true way to validate emails is with basic pattern matching. Something along the lines of looking for @.* is the most you can possibly hope to do.

I completely agree with Rob on that point.

@lukehaas
Copy link
Owner

@CSobol email pattern has now been updated with this PR: #15

@ferk6a
Copy link

ferk6a commented Nov 29, 2016

It also lacks a ^ and $ for the time pattern, just like the date one, otherwise it matches "4;30" when you input "24:00"

@bathos
Copy link

bathos commented Nov 30, 2016

I seem to run into a bug with the pattern document.body.innerHTML=flags//whoops ;)

@TraderStf
Copy link

TraderStf commented Dec 2, 2016

For the email, several regex can help to filter some bad formats.

Lot of sites are still expecting 'simple' emails, eg. max 3 chars for TLD (.com)!
The question is to know if you want a valid one or one that will work on almost on all sites.

Few filters

Maximum length: 254 due to network protocols, not email specs, search RFC...
Minimum length: 7 like [email protected]

.{7,254}

Rough validation of min/max length blocks:
.{1,248}@.{2,250}\..{2,64}

Enhancing this formula, the 3 lengths, is ?impossible? in regex as you need to know the length of each part, must use javascript, not just regex.

Just for Latin char set, supposing case insensitive is set (/..../i):
[a-z][a-z0-9\._-]{0,246}[a-z0-9]@[a-z][a-z0-9\._-]{0,248}[a-z0-9]\.[a-z][a-z0-9\.]{0,61}[a-z]
(Should verify the above one)

A bit more international, but invalid characters are not filtered (spacesss, tabsss, I think controls are except DEL):

[!-\uFFEF]{1,248}@[!-\uFFEF]{2,250}\.[!-\uFFEF]{2,64}

64 is the maximum, today the maximum existing is 24
XN--VERMGENSBERATUNG-PWB
http://data.iana.org/TLD/tlds-alpha-by-domain.txt

Few links

To test your suppositions:
http://cobisi.com/support/kb/emailverify.net/verification-process/validation-levels

Free Mailgun.com validation api, not just the syntax:
https://www.mailgun.com/email-validation

Explanation of unicode in regex:
http://www.regular-expressions.info/unicode.html

For the lazy one, this one is from a framework, dont remember which one...
But mailgun is ok.
Apparently it respects all the rules, except it does not check the length, see above.

function is_valid_email_address(email_address) { var pattern = new RegExp(/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_{|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(.([a-z]|\d|[!#$%&'*+-/=?^_{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))@((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i);

@CSobol
Copy link
Contributor

CSobol commented Dec 14, 2016

The question is to know if you want a valid one or one that will work on almost on all sites.

That's an easy answer. When it comes to email addresses, you never want to stop a valid user from signing up via email address. You would much rather take a hundred junk email address than prevent one valid user from signing up or filling out a form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants