ansaurus

Question

In Perl, how can I capture a string of digits from a string containing carriage returns and line feeds?

Answer 1

+4 A:

Capturing match in list context returns captured strings:

#!/usr/bin/perl

use strict; use warnings;

my $s = join('', map chr(hex), qw(
    0D 0A 20 20 20 20 20 20 20 20 35 30 
    31 34 35 33 39 35 0D 0A 20 20 20 20
));

my ($x) = $s =~ /([A-Za-z0-9]+)/;

print "'$x'\n";

Output:

C:\Temp> uio
'50145395'

Sinan Ünür 2009-11-25 18:47:04

I am getting the string from an XML document and I put up the hex representation to show the hex characters of this string.

Mel 2009-11-25 18:51:33

@Mel: **So?** I used the hex representation of the string to test my code with the exact data you claimed to be using. Anyway, is this part of an attempt to use regular expressions to parse XML?

Sinan Ünür 2009-11-25 18:53:39

+1 For the nice test case

Andomar 2009-11-25 19:18:57

Answer 2

+2 A:

I'm not sure that you need, but here is code extracting all words from string

my @words = ( $sitename =~ m/(\w+)/g );

It can be also done with split. But you need to use spaces now:

my @words = split( m/\s+/, $sitename );

Ivan Nevostruev 2009-11-25 18:48:16

+1 For noticing that he said *characters and numbers*.

Sinan Ünür 2009-11-25 18:57:57

Just to explain (as far as I understand it): this matches `m` all continuous word parts `\w+` and stores them into an array. You can combine them into a single string with `join('',@words)`

Andomar 2009-11-25 19:11:04

Answer 3

+1 A:

The obvious one I didn't see in your post:

$sitename =~ s/\D//g;

This removes all non-digits. To remove anything but word characters, you could:

$sitename =~ s/\W//g;

There's no need for ^ or $ if your intention is to replace every non-digit. Also, you can replace one character at a time if you use the global g option; no need to match more than one digit with \d+.

Andomar 2009-11-25 18:50:00

Answer 4

A:

Edit: My solution was incorrect; please instead pay attention to Sinan Ünür's solution.

Conrad Meyer 2009-11-25 18:50:19

But the `s` has no effect if you're not using `.` ? hehe

Andomar 2009-11-25 18:53:17

There is no **`.`** character in the pattern so this is completely and utterly irrelevant.

Sinan Ünür 2009-11-25 18:54:40

The point is, the expression is applied to the entire string, instead of a line at a time.

Conrad Meyer 2009-11-25 18:59:57

@Conrad Meyer: `m//` and `s///` is always applied to the entire string. The `s` modifier changes how the **pattern** is interpreted.

Sinan Ünür 2009-11-25 19:03:14

Aha! Please forgive my lack of Perl knowledge. Thanks for the clarification!

Conrad Meyer 2009-11-25 19:07:01

Answer 5

A:

In the past I have done something like:

my $newline = chr(13) . chr(10);

$data =~ s/$newline/ /g;

You can check out other ascii character codes at: http://www.asciitable.com./

use strict;

my $newline = chr(13);
my $newline2 = chr(10);

my $words = "\r\n        50145395\r\n    ";

foreach my $char (split //, $words) {
 my $val=ord($char);    
 print "->$char<- ($val)\n";
}

print "$words\n";

$words =~ s/$newline//g;
$words =~ s/$newline2//g;
$words =~ s/[ ]+//g;

foreach my $char (split //, $words) {
 my $val=ord($char);    
 print "->$char<- ($val)\n";
}

print "$words\n";

Courtland 2009-11-25 19:08:12

Answer 6

A:

Do you want to remove only newlines and carriage returns? If so, this is what you want:

$sitename =~ s/[\r\n]//g;

If you want to remove all whitespace, not just newlines and linefeeds, use this instead:

$sitename =~ s/\s//g;

markusk 2009-11-25 19:09:51

Answer 7

A:

$x = <<END;
this is a multiline 
string. this is a multiline
string.
END

$x =~ s/\r?\n?//g;
print $x;

prime_number 2009-11-25 19:12:55

ansaurus

tags:

views:

answers:

In Perl, how can I capture a string of digits from a string containing carriage returns and line feeds?

related questions