I don't think you're really correct about "email addresses" being context-free, or at least, citation, please?
When I look at a generic "email address" entry field on a random form on the Internet, say on the sign-up page for some hot new startup's service, I expect it to take what RFC 5322 §3.4.1[1] calls an `addr-spec`; specifically, I don't ever expect such fields to take the grammar of what that RFC calls an `address`. I don't think most people are going to think they can enter that, nor would most programmers even think to implement it. And I certainly wouldn't want to try explaining it to a PM…
If you accept that assumption, what about `addr-spec` isn't regular?
Also, using that assumption, your "perfectly valud email addresses such as …" would appear to not be valid, as it has unbalanced quotes. (In fact, even under the grammar of `address`, I'm not sure it's valid; it feels like it should be invalid for the same reason, but I've not rigorously checked this.)
> I don't think you're really correct about "email addresses" being context-free, or at least, citation, please?
> When I look at a generic "email address" entry field on a random form on the Internet, say on the sign-up page for some hot new startup's service, I expect it to take what RFC 5322 §3.4.1[1] calls an `addr-spec`; specifically, I don't ever expect such fields to take the grammar of what that RFC calls an `address`.
Well, sure. Let's look at what RFC 5322 defines as an addr-spec[1]:
Let's ignore quoted-string and obs-local-part for the moment. What is a dot-atom?
dot-atom = [CFWS] dot-atom-text [CFWS]
And what is CFWS?
CFWS = (1*([FWS] comment) [FWS]) / FWS
What's a comment?
comment = "(" *([FWS] ccontent) [FWS] ")"
So far, all of this has been matchable with a regular expression. But what's a ccontent?
ccontent = ctext / quoted-pair / comment
See that there? A comment is composed of a balanced pair of parentheses around, perhaps, another comment! Thus (this (is (a (heavily (commented (email \(address))))))foo@bar.example(some more (to prove (the point))) is a perfectly viable RFC5322 address!
Pair-balancing, of course, is impossible with regular expressions, since matching pairs requires push-down automata (which match CFGs) and cannot be done with finite-state machines, (which match regular expressions).
QED.
> Also, using that assumption, your "perfectly valud[sic] email addresses such as …" would appear to not be valid, as it has unbalanced quotes.
Nope, there are no unbalanced quotes in (this)"()<>[]:,;@\\\"!#$%&'-/=?^_`{}| ~.a"(is)@(valid)example.org(honest): the first quote balances with the third, while the second quote is one of a quoted pair \" (which is allowed within a quoted-string, which is allowed within a local-part). It's all allowed per the spec.
I'll admit that it's a bit surprising, but it's true. One simply cannot match a valid RFC5322 addr-spec with a regular expression. One can, of course, match it with something which pretends to be regular but isn't really (as I noted).
Oh, shoot, I missed that. I saw "FWS" (meaning "folding white space"), and assumed that didn't include comments since they have nothing to do with folding whitespace.
I'm not sure if they are context-free, but talks about Parseable Expression Grammars, specifically Lua's LPeg, implies they might be. Conceptually, PEGs are to Context Free Grammars as RegEx is to Regular Expressions.
You can find multiple short email validation snippets using Lua LPeg pretty easily, but this is from the Lua creator's talk about LPeg which includes a part about proper RFC822 validation and how complicated it is for regex, but can be concisely done with PEGs.
I don't think you're really correct about "email addresses" being context-free, or at least, citation, please?
When I look at a generic "email address" entry field on a random form on the Internet, say on the sign-up page for some hot new startup's service, I expect it to take what RFC 5322 §3.4.1[1] calls an `addr-spec`; specifically, I don't ever expect such fields to take the grammar of what that RFC calls an `address`. I don't think most people are going to think they can enter that, nor would most programmers even think to implement it. And I certainly wouldn't want to try explaining it to a PM…
If you accept that assumption, what about `addr-spec` isn't regular?
Also, using that assumption, your "perfectly valud email addresses such as …" would appear to not be valid, as it has unbalanced quotes. (In fact, even under the grammar of `address`, I'm not sure it's valid; it feels like it should be invalid for the same reason, but I've not rigorously checked this.)
[1]: https://tools.ietf.org/html/rfc5322#section-3.4.1