tags:

views:

233

answers:

3

Is there some trick to do multi-line regular expression matches with <>, and loop over them? This example results in no matches when run on files with \n as the newline separator:

while (<> =~ m/\n./) {
  print($.);
}

I need to know the line of the start of the match inside the while loop, as in the example.

The goal is to find all lines which have less than 75 characters which are followed by a line starting with a space (the standard vCard way of splitting long lines):

while (<> =~ m/(^|\n).{0,74}\n /)
+6  A: 

What are you tring to do in that regex? It looks like you are trying to find any case where a newline is followed by at least one character, and then that leads you to print the line number ($.) of whatever matches that criterion.

If you don't mind my asking, what's the larger purpose here?

In any case, see this article for a clear discussion of multiline matching: Regexp Power

Edited after the move to SO: If what you really want is to find the lines with less than 75 characters and a next line beginning with a space, I wouldn't use one regex. The description points to an easier and clearer (I think) solution: (1) filter out all lines with less than 75 characters (the length function is good for that). For the lines that remain, (2) check if the next line starts with a space. That gives you clear logic and an easy regex to write.

In response to the question about getting the "next" line. Think of it the other way around: you want to check every next line, but only if the previous line was less than 75 characters. So how about this:

my $prev = <>; # Initialize $prev with the first line

while (<>) {
    # Add 1 to 75 for newline or chomp it perhaps?
    if (length $prev < 76) {
        print "$.: $_" if $_ =~ m/^\s/;
    }
    $prev = $_;
}

(Note that I don't know anything about vCard format and that \s is broader than literally "a single space." So you may need to adjust that code to fit your problem better.)

Telemachus
Thanks for the link; unfortunately there's no mention of file handles there that I can find. Maybe the direction is wrong for this sort of matching, but I was hoping to be able to use `$.` instead of keeping track of a line count and/or keeping track of the previous line.
l0b0
Sounds good, but how do I get the next line without removing it from <>?
l0b0
@l0b0: I think you're mixing things together. The article explains multi-line matching with a regular expression. That's what you asked about originally. File handles aren't relevant to that problem *per se*. See above for the other part of your comment.
Telemachus
Note that when reading via `<>` the value of `$.` is *not* reset across files. See `perldoc -f eof` for how to do it manually if that's important to you.
Michael Carman
@Michael: It's a good point (and once upon a time, I meant to say that here, but this question moved around a bunch (literally and figuratively)).
Telemachus
+2  A: 

Do you have a file with arbitrary text mixed with vCards?

If all you have is a bunch of vCards in file and you want to parse them, there some vCard parsing modules on CPAN.

See, for example, Text::vCard, specifically Text::vCard::Addressbook.

Regarding,

while (<> =~ m/\n./) {
  print($.);
}

This would indeed not match anything because of the simple fact that input is read line-by-line meaning there cannot be anything in $_ after the newline.

If there never be more than single continuation line following each line shorter than 76 characters, the following might fulfill the requirements:

#!/usr/bin/perl

use strict; use warnings;

for 
( 
    my $this = <>, my $next = <>;
    defined ($next = <>);
    close ARGV if eof
) 
{
    printf "%s : %d\n", $ARGV, $. - 1 if 76 > length $this and $next =~ /^ /;
}
Sinan Ünür
+4  A: 

Did you remember to put the handle in multi-line mode by setting $/ to the empty string or the undefined value?

The following program does what you want:

#! /usr/bin/perl

use warnings;
use strict;

$/ = "";

*ARGV = *DATA;

while (<>) {
  while (/^(.{0,75}\n(^[ \t].{1,75}\n)*)/mg) {
    my $vcard = $1;

    $vcard =~ s/\r?\n[ \t]//g;

    print $vcard;
  }
}

__DATA__
DESCRIPTION:This is a long description that exists on a long line.
DESCRIPTION:This is a long description
  that exists on a long line.
DESCRIPTION:This is a long descrip
 tion that exists o
 n a long line.

Output:

$ ./try
DESCRIPTION:This is a long description that exists on a long line.
DESCRIPTION:This is a long description that exists on a long line.
DESCRIPTION:This is a long description that exists on a long line.
Greg Bacon