tags:

views:

35

answers:

2

Hi,

I've been trying to make a regex to match the charset of mime multipart emails so as I can decode them correctly. However I've found that there are some differences in the format that I can't seem to work out a regex for, as I'm no expert. currently I'm using (?<=charset=).*(?=;) however the examples I've found by sending emails from different clients are:

Content-Type: text/plain; charset=ISO-8859-1; format=flowed

charset=US-ASCII;

Content-Type: text/plain; charset=iso-8859-1

So my regex works on first two but not the last, however if I remove (?=;) then I will also match the format=flowed part, which I don't want.

any ideas?

A: 

Match on either ; or the end of line ($).

Sjoerd
If `.*` is greedy, this will overmatch if there are multiple `;` following `charset=`
polygenelubricants
+2  A: 

Instead of .*, you can use [^;]*. That is, match anything but the ;.

So, the pattern becomes:

(?<=charset=)[^;]*

References

polygenelubricants
Java, on ideone.com: http://ideone.com/BbBMV
polygenelubricants
nice one, I should have thought of that
TonyVipros