views:

55

answers:

2

Hi guys I'm retrieving emails and some of my emails have utf encoded text. However even though my page is encoded as utf 8 - in some places when I try to out put utf text I get funny characters like :

=?utf-8?B?Rlc6INqp24zYpyDYotm+INin2LMg2YXYs9qp2LHYp9uB2bkg2qnbjCDZhtmC?= 
=?utf-8?B?2YQg2qnYsdiz2qnYqtuSINuB24zaug==?=

Whereas in other areas of the same page it displays fine. WHats going on?

+1  A: 

You may be seeing undecoded e-mail headers: =? is the starting delimiter, utf-8 means the text is in utf-8 and B means base-64 encoded. ?= is the ending delimiter. So, base64_decode() the part between the question marks and you'll get the content.

Piskvor
+5  A: 

It's not "funny characters", those are legitimate ASCII characters. It's just that the string is MIME encoded for transport, so you'll need to put it through mb_decode_mimeheader.

deceze
How can I check in code if the string is mime encoded
Ali
@Ali Good question. I believe if the string is not MIME encoded, `mb_decode_mimeheader` will just pass it through as-is, so it should be save to use on any string. For the email **body** you should parse the header for clues as to what transport encoding it was sent in.
deceze
running it through mime decode leaves normal strings intact however in my case the original encoded string now shows up as a series of question marks.
Ali
@Ali I think you're quickly getting into very deep waters there. If you need a more complex mail parser, I highly recommend using an existing library that covers all the edge cases. Correct mail parsing is a terrifically complex undertaking. PHP has a PECL extension called Mailparse: http://www.php.net/manual/en/book.mailparse.php
deceze
I think I'm sinking already :S - actually I'm using the Zend framework to build an email interface. And right now seem to find out certain severe limitations that the framework seems to have in this respect.
Ali