C library for parsing email addresses

Does anyone know of a good free library for parsing email addresses? Preferrably something that's already packaged for Debian. I want to have a library sort out the local part, the domain, and the comment of an email address. I've just discovered a bug in a program which does this. The fact that the program in question has been running for many years (last update was over a year ago and the code in question wasn't changed for some time before that) and processed probably hundreds of thousands of messages without the bug being noticed demonstrates that it's not so easy to get this right. It seems likely that there may be other bugs after I fix this one. So I'd rather use a library to ensure that it's had wider testing before I use it. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On 6/10/2013 11:01 PM, Russell Coker wrote:
Does anyone know of a good free library for parsing email addresses? Preferrably something that's already packaged for Debian.
I want to have a library sort out the local part, the domain, and the comment of an email address.
I've just discovered a bug in a program which does this. The fact that the program in question has been running for many years (last update was over a year ago and the code in question wasn't changed for some time before that) and processed probably hundreds of thousands of messages without the bug being noticed demonstrates that it's not so easy to get this right. It seems likely that there may be other bugs after I fix this one. So I'd rather use a library to ensure that it's had wider testing before I use it.
I think you need to provide more details ..... What was the package that you were using? And at what point do you need to process the data? Is the data source: 1. incoming email; 2. outgoing mail; 3. Stored mail [what type of storage]; 4. just a plain list of email addresses; or something else? This might get you going: A Perl Extensible Mail Filter --> http://marginalhacks.com/Hacks/pemf/ Cheers A.

Andrew McGlashan writes:
This might get you going: A Perl Extensible Mail Filter --> http://marginalhacks.com/Hacks/pemf/
I didn't notice at first either, but he hid a "C library" constraint in the Subject. I can't recommend a library, but I *can* tell you that email addresses aren't regular, so a regular expression is wrong. Observe: foo(@example(.net))@example.org ==> foo@example.org You need a CFG because comment nesting implies context. I don't have an EBNF handy, the RFC probably includes one. I had a quick look, but I can't see a C library that *just* parses email in apt. You probably need one that parses (say) message/rfc822 in general, and just use the address parsing function. PS: for perl, libemail-address-perl (Email::Address) is the thing.

On Mon, Oct 7, 2013, at 10:58 AM, Trent W. Buck wrote:
Andrew McGlashan writes:
This might get you going: A Perl Extensible Mail Filter --> http://marginalhacks.com/Hacks/pemf/
I didn't notice at first either, but he hid a "C library" constraint in the Subject.
I can't recommend a library, but I *can* tell you that email addresses aren't regular, so a regular expression is wrong.
Jeffrey Friedl's book "Mastering Regular Expressions" (O'Reilly) has an RFC 822 compliant regular expression in it that is meant to be very robust. D J Bernstein's mess822 library is probably the best solution in C that I am aware of: http://cr.yp.to/mess822.html (with supporting discussion and notes at http://cr.yp.to/immhf.html) Regards Graeme

Graeme Cross <gcross@fastmail.fm> writes:
I can't recommend a library, but I *can* tell you that email addresses aren't regular, so a regular expression is wrong.
Jeffrey Friedl's book "Mastering Regular Expressions" (O'Reilly) has an RFC 822 compliant regular expression in it that is meant to be very robust.
I do not believe that is possible. AFAICT, this ABNF snippet from RFC 5322 (p12) is irregular: ccontent = ctext / quoted-pair / comment comment = "(" *([FWS] ccontent) [FWS] ")" I glanced at RFC 822 (which is obsolete); it also has a comment nonterminal, although it's not so obviously recursive.

In reply to trentbuck@gmail.com (Trent W. Buck):
Graeme Cross <gcross@fastmail.fm> writes: ... I do not believe that is possible. AFAICT, this ABNF snippet from RFC 5322 (p12) is irregular:
My formal-language theory is a bit rusty, but, indeed, you can't recognize context-free languages (having nesting), using regular expressions (even extended with backreferences, as they usually are in practice). -- Smiles, Les.

On 7/10/2013 10:58 AM, Trent W. Buck wrote:
Andrew McGlashan writes:
This might get you going: A Perl Extensible Mail Filter --> http://marginalhacks.com/Hacks/pemf/
I didn't notice at first either, but he hid a "C library" constraint in the Subject.
Yes, I did see that, but I also thought it was worth Russell further defining his problem situation. The perl script /may/ have been useful, depending on what needed to be done. Cheers A.

Does anyone know of a good free library for parsing email addresses? Preferrably something that's already packaged for Debian.
Well the 'ap' program in the nmh package parses address at the shell level. Depending on how horrible the nmh code is, there might be a usable C library behind that. There's also the 'formail' program in nmh, which has a different purpose, but I think does some address parsing. -- Smiles, Les.

On 6 October 2013 23:01, Russell Coker <russell@coker.com.au> wrote:
Does anyone know of a good free library for parsing email addresses? Preferrably something that's already packaged for Debian.
I want to have a library sort out the local part, the domain, and the comment of an email address.
I've just discovered a bug in a program which does this. The fact that the program in question has been running for many years (last update was over a year ago and the code in question wasn't changed for some time before that) and processed probably hundreds of thousands of messages without the bug being noticed demonstrates that it's not so easy to get this right. It seems likely that there may be other bugs after I fix this one. So I'd rather use a library to ensure that it's had wider testing before I use it.
This is from a Perl library, but it's by a good author and essentially is just a bunch of regexes, with some logic around it, so should be possible to port to C. http://cpansearch.perl.org/src/RJBS/Email-Address-1.900/lib/Email/Address.pm or you could embed a Perl interpreter and just run the module directly from C.

On Sun, 6 Oct 2013 11:01:13 PM Russell Coker wrote:
Does anyone know of a good free library for parsing email addresses?
Library no, but Fetchmail by ESR has an rfc822.c that it uses for parsing addresses that is adaptable (indeed it's used inside of the Vacation program that I'm maintaining) - MIT licensed. Now available on gitorious. https://gitorious.org/fetchmail/ The comments say: THEORY: How to parse RFC822 headers in C. This is not a fully conformant implementation of RFC822 or RFC2822, but it has been in production use in a widely-deployed MTA (fetcmail) since 1996 without complaints. Really perverse combinations of quoting and commenting could break it. good luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC This email may come with a PGP signature as a file. Do not panic. For more info see: http://en.wikipedia.org/wiki/OpenPGP
participants (7)
-
Andrew McGlashan
-
Chris Samuel
-
Graeme Cross
-
Les Kitchen
-
Russell Coker
-
Toby Corkindale
-
trentbuck@gmail.com