tags:

views:

203

answers:

5

I currently have this: tr/[.]+(?=<)//d which should remove all characters (represented by [.]+) up to the first "<", because I'm using a positive lookahead. But for some reason, it's removing all "." and "<" from the string.

For the record, I am not processing HTML or XML with regular expressions.

+1  A: 

Edit as it was clarified:

if ($line =~ /^.+?<(.+)/) {
  push @matched, $1;
}
Oesor
@Oesor Have you tried this code with lines that have more than one `<` in them?
Sinan Ünür
Sinan: Good point. I just relooked at this, and if I have a log message with a < in it (for some reason), then I might get unexpected results.
Thomas Owens
I'm not thinking today -- .*? would be correct and only match to the first <, yes?
Oesor
You might want to edit your answer then.
glenn jackman
@Oesor: Yes. Also, anchoring the expression and using `+` when you mean `+` are better practices.
Sinan Ünür
The * was intentional -- nowhere in the initial post was it stated that the < couldn't be the initial character in the string, just that he wanted the bit after the initial <
Oesor
+6  A: 

The meaning of characters change when used in [] as a character class. [.] has no meaning, really, different from just . so the . is interpreted literally in that context.

I think this should work just fine:

$text =~ s/^.*?</</s;
Devin Ceartas
Works like a charm. Thanks.
Thomas Owens
@Devin Why do you need `sm`?
Sinan Ünür
just /s is probably all that is needed - see http://perldoc.perl.org/perlre.html#Modifiers -- if the first < is not on the first line, we want the . to match new lines.
Devin Ceartas
It's not a big deal, but with .* at the beginning of a regex, you don't need a beginning of string anchor.
brian d foy
Note that you don't need the non-greedy specifier. s/^[^<]*</</ will work fine too, without getting beyond the "simple subset" of regexes that everyone knows. I guess the choice is a matter of style.
Andy Ross
+6  A: 

You do not want tr.

#!/usr/bin/perl

use strict;
use warnings;

while ( <DATA> ) {
    last unless /\S/;
    s/^.+?</</;
    print;
}

__DATA__
a < b < c
a < b < c
Sinan Ünür
Then what do I want?
Thomas Owens
The substitution operator, `s///`.
Sinan Ünür
+4  A: 
^[^<]+

. (dot) within the character class is a literal dot, not a wildcard.

SilentGhost
@SilentGhost No need to do anything if there are no characters before `<`.
Sinan Ünür
it won't hurt :)
SilentGhost
@SilentGhost in general, it is not a good habit to use `*` when you mean `+` due to issues with backtracking and unexpected matches.
Sinan Ünür
+3  A: 

The '.' in a character class is not a meta-character. Also you want s///, not tr, which replaces single characters. so s/^.+(?=<)// should work, although personally I would write s{^.*<}{<}, to avoid the lookahead thingie.

mirod
@mirod No need to do any replacements if there are no characters before `<`.
Sinan Ünür
There is no need to, I just find s/^.+(?=<)// harder to read. I have to pause and remember that ?= is a positive lookahead, My brain can parse s{^.*<}{<} much faster
mirod
There is also no need for lookahead. Either `s/^.+?</</` or `s/^[^<]+//` is cleaner.
Sinan Ünür
Indeed, s/^[^<]+// is quite nice
mirod