views:

776

answers:

3

I'm having a problem emailing unicode characters using smtplib in Python 3. This fails in 3.1.1, but works in 2.5.4:

  import smtplib
  from email.mime.text import MIMEText

  sender = to = '[email protected]'
  server = 'smtp.DEF.com'
  msg = MIMEText('€10')
  msg['Subject'] = 'Hello'
  msg['From'] = sender
  msg['To'] = to
  s = smtplib.SMTP(server)
  s.sendmail(sender, [to], msg.as_string())
  s.quit()

I tried an example from the docs, which also failed. http://docs.python.org/3.1/library/email-examples.html, the Send the contents of a directory as a MIME message example

Any suggestions?

+2  A: 

_charset parameter of MIMEText defaults to us-ascii according to the docs. Since is not from us-ascii set it isn't working.

example in the docs that you've tried clearly states:

For this example, assume that the text file contains only ASCII characters.

You could use .get_charset method on your message to investigate the charset, there is incidentally .set_charset as well.

SilentGhost
As you say, the charset is us-ascii, which does not include €. Using set_charset on the msg does not fix the problem. The problem (I should have been more precise) is on the sendmail line - UnicodeEncodeError: 'ascii' codec can't encode character '\x80' in position 161: ordinal not in range(128) I read this to mean that I have to encode the text so that everything is in range(128), but I haven't been able to figure out how to.
foosion
I was looking at the 3rd example on the examples page, sending an entire directory. I tried sending a directory consisting of a single zip file using the example. This failed.
foosion
+3  A: 

The key is in the docs:

class email.mime.text.MIMEText(_text, _subtype='plain', _charset='us-ascii')

A subclass of MIMENonMultipart, the MIMEText class is used to create MIME objects of major type text. _text is the string for the payload. _subtype is the minor type and defaults to plain. _charset is the character set of the text and is passed as a parameter to the MIMENonMultipart constructor; it defaults to us-ascii. No guessing or encoding is performed on the text data.

So what you need is clearly, not msg = MIMEText('€10'), but rather:

msg = MIMEText('€10'.encode('utf-8'), _charset='utf-8')

While not all that clearly documented, sendmail needs a byte-string, not a Unicode one (that's what the SMTP protocol specifies); look to what msg.as_string() looks like for each of the two ways of building it -- given the "no guessing or encoding", your way still has that euro character in there (and no way for sendmail to turn it into a bytestring), mine doesn't (and utf-8 is clearly specified throughout).

Alex Martelli
That sends without generating an error message. I sent to Thunderbird and gmail. Thunderbird only showed 10 as the text of the message. Gmail showed the full €10.Python sends as 'content-transfer-encoding: base64' while Thunderbird sends €10 as 'content-transfer-encoding: 8-bit' and gmail sends as 'multipart/alternative; boundary=...' Any suggestions for generating a message that Thunderbird can interpret?
foosion
I'm no Thunderbird expert, but try other encodings such as `iso-8859-15`. Though any program these days that can't do utf-8 properly IS well worth throwing into the dustbin of history, mind!-)
Alex Martelli
The problem does not seem to be iso-8859-15 or utf-8, it seems to be content-transfer-encoding. Everything else I checked uses 8-bit, while python uses base64. Coercing the header to 8-bit doesn't help. Using quopri.encodestring() might work to get 8-bit encoding, but I haven't been able to figure out how to make it work.
foosion
A: 

Gus Mueller had a similar issue: http://bugs.python.org/issue4403

Paul D. Waite