ansaurus

Question

Answer 1

+3 A:

While I did not verify it (I am not going to install a module from March 2001), the module apparently already decodes to Perl native strings, so you do not have to do much. The straightforward way works just fine, no need to overcomplicate things by those substitutions.

use utf8;
my $val = '＜設定B-１コース＞';

# does it match A or B, followed by a dash, followed by a fullwidth １,２ or ３?
$val =~ /(?:A|B)-[１２３]/;  # returns true/1

daxim 2010-09-16 09:41:47

Thanks for replying,But my $val = '＜設定B-１コース＞';does not appear any where in perl, its copied and pasted from sheet opened in excel.instead the values stored in perl object are stored as either wide character codes as given in my comment 1 or those '[-0' dummy values.

awake416 2010-09-16 10:54:28

For now What I am looking for is some way to get all the values which appear in ascii range, in wide character codes to be converted to corresponding ascii's so that I can use regex to match and fetch those rows for further processing in my application.

awake416 2010-09-16 11:03:39

My `$val` dumps to exactly the same representation which you have written in the question. You call it wide character codes, but it's really just native Perl strings. - For replacing fullwidth digits with ASCII digits, just do `use utf8; $val =~ tr[０-９][0-9];`.

daxim 2010-09-16 11:10:29

In that case, It should print $oWkc->{_Value} =~ tr[0-9][0-9]; print $oWkc->{_Value} . "\n" if ($oWkc->{_Value} =~ /B-1/);When there are cells with B-1 in them, but I am not getting any thing there.

awake416 2010-09-16 12:12:21

You are victim of your lack of copy and paste. I wrote `tr[０-９][0-9];`, not `tr[0-9][0-9];`. They are different, and only the first one works as intended.

daxim 2010-09-16 14:09:27

Thanks, I will give it a try and trouble you if needed :-) .. BTW thanks for your quick replies ...

awake416 2010-09-16 17:52:02

Answer 2

+2 A:

To deal with multi-byte characters in Spreadsheet::ParseExcel you should update to the latest version and use the FmtJapan formatter. Several bug fixes around Japanese formatting went into recent versions.

Here is an example:

#!/usr/bin/perl


use warnings;
use strict;
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::FmtJapan;

my $filename  = 'Test2000J.xls';
my $parser    = Spreadsheet::ParseExcel->new();
my $formatter = Spreadsheet::ParseExcel::FmtJapan->new();
my $workbook  = $parser->parse($filename, $formatter);

if ( !defined $workbook ) {
    die "Parsing error: ", $parser->error(), ".\n";
}

# Set your output encoding.
binmode STDOUT, ':encoding(cp932)';
# Or maybe this:
#binmode STDOUT, ':utf8';


for my $worksheet ( $workbook->worksheets() ) {

    print "Worksheet name: ", $worksheet->get_name(), "\n\n";

    my ( $row_min, $row_max ) = $worksheet->row_range();
    my ( $col_min, $col_max ) = $worksheet->col_range();

    for my $row ( $row_min .. $row_max ) {
        for my $col ( $col_min .. $col_max ) {

            my $cell = $worksheet->get_cell( $row, $col );
            next unless $cell;

            print "    Row, Col    = ($row, $col)\n";
            print "    Value       = ", $cell->value(),       "\n";
            print "    Unformatted = ", $cell->unformatted(), "\n";
            print "\n";
        }
    }
}

jmcnamara 2010-09-16 09:54:38

Answer 3

A:

This works for me. I am able to getting result now. But what are these characters '０-９'. Can I get more details about these or may be some doc/link. Also I want to know more about how perl stores data internally.

awake416 2010-09-17 08:05:00

These characters are [fullwidth digits](http://www.fileformat.info/info/unicode/block/halfwidth_and_fullwidth_forms/list.htm). You are not supposed to care how Perl stores strings internally, just use the exposed APIs, but if you really care: it is an [well-documented open secret](http://p3rl.org/UNI). If any of that information is not clear enough, then [open a new question](http://stackoverflow.com/questions/ask).

daxim 2010-09-20 16:47:26

Thanks Daxim .. You are rocking ..!!

awake416 2010-09-21 06:09:09

ansaurus

tags:

views:

answers:

excel with Japanese(wide) fonts

related questions