views:

443

answers:

5

Basically I have a database where I get $lastname, $firstname, $rid, $since, $times and $ip from.

Using a Perl script, I format the data to send it via e-mail. Since the $lastname and $firstname can contain special chars (for instance ä, ü, ß, é,...) I first decode the strings.

my $fullname = decode("utf8", $lastname) . ', ' . decode("utf8", $firstname);
my $send = swrite(<<'END', $ip, $fullname, $rid, $since, $times);
@<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<< @<<<<<<<<<<<<<< @>>END

Without decode, the special chars are garbage (ä becomes À) and the rest is OK.
With decode, everything is fine except the lines with name containing special chars have a couple of < too many.

Why is that? And how do I remove them?

Edit: swrite is from perldoc perlform

sub swrite {
  my $format = shift;
  $^A = '';
  formline($format, @_);
  return $^A;
}

Edit2: The problem is not the terminal nor STDOUT. I use:

use Mail::Sender;
use vars qw($sender);
#...
$sender->MailMsg({to => $mailto, 
  cc=> "", 
  bcc => "", 
  subject => "subject", 
  msg => $send});

And the characters are badly shown when receiving the email.

Edit 3:
The data I get is already scrambled. I get 'À' instead of 'ä' and that's why my format fails, because the number of chars decreases when using decode.

+3  A: 

The problem there is that the format engine isn't understanding your UTF-8; it thinks each byte is a character. I don't actually know if you can get formline (the underlying mechanism of swrite) to speak Unicode, but try this:

use open qw( :std :encoding(UTF-8) );

This attempts to apply UTF-8 encoding as broadly as possible.

You will probably need to skip your decode usage with this on.

chaos
I get a "Too late for "-CSD" option at ./test.pl line 1."What am I doing wrong?
Researching it a bit, it looks like at some point using `-C` on the shebang line became unsupported. Edited to provide what I hope is a substitute.
chaos
+2  A: 

If you are using the swrite function from perldoc perlform, your problem is either that STDOUT is not setup for UTF-8 or your terminal cannot handle UTF-8. For the first case, you have a few options. The first is to use binmode to tell STDOUT to expect UTF-8:

#!/usr/bin/perl

use strict;
use warnings;

use Carp;

sub swrite {
    croak "usage: swrite PICTURE ARGS" unless @_;
    my $format = shift;
    $^A = "";
    formline($format, @_);
    return $^A;
}

my $fmt = "@<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<< @<<<<<<<<<<<<<< @>>";

binmode STDOUT, ":utf8";

my ($ip, $rid, $since, $times) = qw/1.1.1.1 5 2009-08-19 20/;
my $firstname = "Ch\x{e4}s";
my $lastname  = "\x{d6}wens";
my $fullname  = "$lastname, $firstname";
my $send      = swrite $fmt, $ip, $fullname, $rid, $since, $times;
print "$send\n";

Another option is set the PERL_UNICODE environment variable to SDL (this is similar to chaos's -CSD on the commandline):

PERL_UNICODE=SDL perl script.pl

or

export PERL_UNICODE=SDL
perl script.pl

There are other ways of telling STDOUT to expect UTF-8, but I can't remember them off the top of my head (I put export PERL_UNICODE=SDL in my .profile a long time ago).

If the problem is your terminal, well you need to either configure it properly or get a different terminal. The code above works on a properly configured terminal, so you can use it as a test.

Chas. Owens
Actually I'm not using the terminal nor STDOUT. See my updated question.
+4  A: 

My minimal test case seems to think that format handles Unicode just fine:

perl -MEncode -e 'formline("X@<<X", Encode::decode("utf-8","ほげぼげ")); print $^A'

The output is three characters, as expected. But anyway, format is seriously deprecated. Time to use something else instead.

jrockway
+2  A: 

I have never had the desire to learn about formats. This is a bad answer because I am unable to offer any insight into your problem and/or potential solutions, but others have already done that. I am going to offer two suggestions for replacements.

The first one, Perl6::Form ought to be useful as a better format although I had never used it until I put together this example today. On the other hand, I have used Text::Table and it is very useful for creating tables in plain text (most of the time, I just generate HTML, but email is still one of those places where plain text is plainly better).

Perl6::Form example:

#!/usr/bin/perl

use strict;
use warnings;

use Perl6::Form;

my @data = (
    ['127.0.0.1', 'Johnny Smithey', 'JLNSJIV', 14, 5],
    ['127.0.0.2', 'Ömer Seyfettin Şınas', 'OSS3', 25, 5],
);

for my $data_ref ( @data ) {
    print format_data($data_ref);
}

sub format_data {
    my ($data) = @_;
    return form
        '{<<<<<<<<<<<<<<<} {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<} ' .
        '{<<<<<<<<<<} {<<<<<<<<<<<<<<} {>>}',
        @$data;
}

Text::Table example:

#!/usr/bin/perl

use strict;
use warnings;

use Text::Table;

my %common_options = (
    align => 'left',
    title_align => 'center',
);

my $sep = \' ';

my $table = Text::Table->new(
    {
        title  => 'IP Address',
        sample => '<' x 15,
        %common_options,
    },
    $sep,
    {
        title => 'Full Name',
        sample => '<' x 34,
        %common_options,
    },
    $sep,
    {
        title => 'RID',
        sample => '<' x 10,
        %common_options,
    },
    $sep,
    {
        title => 'Since',
        sample => '<' x 14,
        %common_options,
    },
    $sep,
    {
        title => 'Times',
        sample => '>' x 2,
        align => 'right',
        title_align => 'center'
    },
);

$table->rule('');

$table->load(
['127.0.0.1', 'Johnny Smith-Jones', 'JLNSJIV', '20090814010203', 5],
['127.0.0.2', 'Ömer Seyfettin Şınas', 'OSS3', '20071211101112', 3],
['192.168.172.144', 'Jane Doe', 'JD156', '20080101010101', 1],
);

print $table->table;
Sinan Ünür
+1 For bit more about Perl6::Form also see http://stackoverflow.com/questions/236629/what-other-languages-have-features-and-or-libraries-similar-to-perls-format/237031#237031
draegtun
A: 

I don't know about formats or swrite, but I do know about your email problem.

The characters you see in the received email are UTF-8. However, your mailer is set to display something else by default (like Windows-1252 or Latin-1).

The solution is to add a header in your email which informs the mail program about the character encoding, so that it can display it correctly. The headers you need to add to the email are:

Mime-version: 1.0
Content-type: text/plain; charset="UTF-8"

(or another charset, making sure that it correspond's to the body of the email)

Additionally, you may want to encode the email into a 7bit encoding like "quoted-printable", and add the corresponding header:

Content-transfer-encoding: quoted-printable

That last encoding can be done with the MIME::QuotedPrint module.

mivk