views:

131

answers:

3

I have a Perl script to open this page http://svejo.net/popular/all/new/ and filter the names of the posts but except headers everything seems encrypted. Nothing can be read. When I open the same page in a browser everything looks fine including the source code. How is it possible to encrypt a page for a script and not for a browser? My Perl script sends the same headers as my browser (Google Chrome).

+2  A: 

The page is encoded with UTF-8. Perhaps your Perl script is using a different encoding?

I found this page that describes Processing UTF-8 Files with Perl.

Stevko
I guess this is the problem, but I can't figure out how to tell perl to use utf8. I read that by default it uses exactly utf8.
Tichomir Mitkov
+4  A: 

The page looks fine to me, although I don't read Bulgarian.

#!perl

use LWP::Simple;

getprint( 'http://svejo.net/popular/all/new/' );

This script returns the plain page withouth anything that looks odd or encrypted:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="bg" lang="bg">
  <head>

<title>Svejo — Популярните новини </title>

What were you trying, and which versions of perl and the modules are you using? What is the output that you are seeing?

You clarify that you are using ActivePerl on Windows (please update your question with additional details). Remember, not only do you need to do the right Unicode things in your programs, but your terminal has to be set up to display Unicode properly.


What happens when you explicitly binmode your output?

 binmode STDOUT, ':utf8';

Try saving the output to a file and looking at it in an editor that understands UTF-8.


Okay, that didn't work. Let's get even more general and set all handles to use UTF-8 by default:

  use open IO  => ':utf8';
brian d foy
@brian - that's cyrillic but hardly Russian :) If you look at the source, the language is right there: <meta http-equiv="content-language" content="bg" /> - so it's Bulgarian (http://en.wikipedia.org/wiki/Bulgarian_language). As Stevko says, it's UTF-8, or at least so claims my Firefox :)
DVK
By the way, my ActiveState Perl (v5.10.1) - at least in WindowsXP cmd terminal windows - indeed renders the Cyrillic as some sort of upper-ASCII-art-characters-gibberish using your Perl 2-liner above. I'm not well versed enough with UTF to venture a plausible reason :(
DVK
I don't read Bulgarian either. :)
brian d foy
I'm using OpeSuse, Perl version is 5.10.0, the terminal is set to utf8. I even redirected script output to a file - same thing. Obviously perl is using different character encoding than utf8. By the way I tried replacing the connection part of the script with a simple `curl -i http://svejo.net/thestuffhere` (enclosed in backticks) and it works. Now I'm really confused
Tichomir Mitkov
Okay, this is really strange, and a bit hard to figure out remotely. Can you post the compilation details for your perl with `perl -V`? Also, is this a packaged perl or one you compiled yourself? I'd try compiling another perl to see what happens. That's a drastic measure though.
brian d foy
it was built in the distro. I will post the details as a new answer
Tichomir Mitkov
A: 
Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
  Platform:                                                            
    osname=linux, osvers=2.6.27, archname=x86_64-linux-thread-multi    
    uname='linux haley 2.6.27 #1 smp 2009-02-09 15:38:31 +0100 x86_64 x86_64 x86_64 gnulinux '
    config_args='-ds -e -Dprefix=/usr -Dvendorprefix=/usr -Dinstallusrbinperl -Dusethreads -Di_db -Di_dbm -Di_ndbm -Di_gdbm -Duseshrplib=true -Doptimize=-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe -Accflags=-DPERL_USE_SAFE_PUTENV'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -Wall -pipe',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DPERL_USE_SAFE_PUTENV -DDEBUGGING -fno-strict-aliasing -pipe'
    ccversion='', gccversion='4.3.2 [gcc-4_3-branch revision 141291]', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib64'
    libpth=/lib64 /usr/lib64 /usr/local/lib64
    libs=-lm -ldl -lcrypt -lpthread
    perllibs=-lm -ldl -lcrypt -lpthread
    libc=/lib64/libc-2.9.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.9'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.10.0/x86_64-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib64'


Characteristics of this binary (from libperl):
  Compile-time options: DEBUGGING MULTIPLICITY PERL_DONT_CREATE_GVSV
                        PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
                        PERL_TRACK_MEMPOOL PERL_USE_SAFE_PUTENV
                        USE_64_BIT_ALL USE_64_BIT_INT USE_ITHREADS
                        USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API
  Built under linux
  Compiled at Jun 10 2009 16:23:14
  @INC:
    /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/5.10.0
    /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/site_perl/5.10.0
    /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.10.0
    /usr/lib/perl5/vendor_perl
    .
Tichomir Mitkov
You can edit your question to add more details. :)
brian d foy
Oh, yes. I'm sorry, I'm still a newbie
Tichomir Mitkov