views:

25

answers:

2

I have a script that runs on cron that outputs some text which we send to the 'mail' program. The general line is like this:

./command.sh | mail -s "My Subject" [email protected] -- -F "Sender Name" -f [email protected]

The problem is that the text generated by the script has some special characters - é, ã, ç - since it is not in english. When the e-mail is received, each character is replaced by ??.

Now I understand that this is most likely due to the encoding that is not set correctly. What is the easiest way to fix this?

A: 

This is probably not a command line issue, but a character set problem. Usually when sending E-Mails, the character set will be iso-8859-1. Most likely the text you are putting into the process is not iso-8859-1 encoded. Check out what the encoding is of whatever data source you are getting the text from.

Obligatory "good reading" link: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Re your update: In that case, if you enter the special characters manually, your terminal may be using UTF-8 encoding. You should be able to convert the file's character set using iconv for example. The alternative would be to tell mail to use UTF-8 encoding, but IIRC that is not entirely trivial.

Pekka
A: 

You're right in assuming this is a charset issue. You need to set the appropriate environment variables to the beginning of your crontab.

Something like this should work:

LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8

Optionally use LC_ALL in place of LC_CTYPE.

Reference: http://opengroup.org/onlinepubs/007908799/xbd/envvar.html

Edit: The reason it displays fine when you run it in your shell is probably because the above env vars are set in your shell.

To verify, execute 'locale' in your shell, then compare to the output of a cronjob that runs the same command.

Re-Edit: Ok, so it's not an env var problem.

I am assuming you're using mailx, as it is the most common nowdays. It's manpage says:

The character set for outgoing messages is not necessarily the same as the one used on the terminal. If an outgoing text message contains characters not representable in US-ASCII, the character set being used must be declared within its header. Permissible values can be declared using the sendcharsets variable,

So, try and add the following arguments when calling mail:

-S sendcharsets=utf-8,iso-8859-1
Casey
This will get my upvote if it works. I'm not entirely sure whether it will in this case though, as the offending characters are probably already in UTF-8 format (having been entered manually) and `mail` will hardly be able to deal with them either way? But maybe I'm overlooking something. We will see.
Pekka
I have checked and both LANG and LC_CTYPE environment variables are set up as you suggested already.
JohnWithoutArms
Interesting.On my system, /usr/bin/mail is a symlink to /usr/bin/mailx, the manpage for which says:The character set for outgoing messages is not necessarily the same as the one used on the terminal. If an outgoing text message contains characters not representable in US-ASCII, the character set being used must be declared within its header. Permissible values can be declared using the sendcharsets variable.
Casey
I added "locale" to the beginning of the script and ran "scriptName.sh" | mail [email protected]. All lang variables are set as the result in the console (pt_BR.UTF-8) but the characters are still changed into ??.
JohnWithoutArms
I checked "man mail" on my system and it says nothing regarding character encoding or charsets, unfortunately. I'll try setting up the charset as a header and get back to you.
JohnWithoutArms
My Mail command does not accept the -S argument.
JohnWithoutArms
It seems I AM using mailx, as "man mailx" returns the same page as "man mail" and the command itself behaves the same.
JohnWithoutArms
Well, it seems that my version of mailx does not offer support for the -S switch. And it is the "latest" version supported on my 8.04 Ubuntu Server. I guess I'll have to try and remove the special characters from my scripts.
JohnWithoutArms