
On Fri, May 01, 2015 at 04:47:55PM +1000, Erik Christiansen wrote:
My only related experience is trying to display html emails generated by an unenlightened vendor when firefox can't deal with the embedded "=E5"-style gibberish.
that "gibberish" is called quoted-printable[1]. it's not only used for html email. it's not even primarily used for that, it's one of the common methods for encoding messages containing 8-bit characters into 7-bit characters suitable for transmission by/through older MTAs that can't handle 8-bit mail. the most likely cause of firefox not being able to display it correctly is because the sender's MUA didn't set the correct mime-type header when creating the email...probably outlook or one of the many crappy, half-arsed MUAs on windows that don't bother implementing standards correctly. firefox has no problem with QP if the mime headers are set correctly...in fact, most if not all linux browsers and mail clients can decode and display it...most modern mail clients on any OS should be able to read and display QP, if not send it. [1] http://en.wikipedia.org/wiki/Quoted-printable "Quoted-Printable, or QP encoding, is an encoding using printable ASCII characters (alphanumeric and the equals sign "=") to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean.[1] It is defined as a MIME content transfer encoding for use in e-mail. QP works by using the equals sign "=" as an escape character. It also limits line length to 76, as some software has limits on line length." the rest of the article is worth reading for an understanding of what QP is and how to encode/decode it correctly (in short, the number after the = sign is a 2-hex-digit number, 00-FF, specifying the 8-bit character) .
I just put those messages through a couple of lines of awk, and the problem goes away.
And the question did refer to sed, which is as old-unix as awk, and so nullifies __any__ hint of OT-ishness, I assure you.
it would probably be better and more reliable to write a simple perl filter using MIME::Decoder[2], which uses subclasss MIME::QuotedPrint::Perl[3] - the MIME:Decoder docs have an example filter in 3 lines of perl: use MIME::Decoder; $decoder = new MIME::Decoder 'quoted-printable' or die "unsupported"; $decoder->decode(\*STDIN, \*STDOUT); awk and/or sed can probably handle 90+% of cases, or at least make them less ugly to view. a decoder script should handle 100%, and convert them back to 8-bit text. [2] http://search.cpan.org/~dskoll/MIME-tools-5.505/lib/MIME/Decoder.pm [3] http://search.cpan.org/~gaas/MIME-Base64-3.15/QuotedPrint.pm craig -- craig sanders <cas@taz.net.au> BOFH excuse #360: Your parity check is overdrawn and you're out of cache.