tags:

views:

95

answers:

3

According the the Perl documentation on regexes:

By default, the "^" character is guaranteed to match only the beginning of the string ... Embedded newlines will not be matched by "^" ... You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any newline within the string ... you can do this by using the /m modifier on the pattern match operator.

The "after any newline" part means that it will only match at the beginning of the 2nd and subsequent lines. What if I want to match at the beginning of any line (1st, 2nd, etc.)?

EDIT: OK, it seems that the file has BOM information (3 chars) at the beginning and that's what's messing me up. Any way to get ^ to match anyway?

EDIT: So in the end it works (as long as there's no BOM), but now it seems that the Perl documentation is wrong, since it says "after any newline"

+3  A: 

The ^ does match the 1st line with the /m flag:

~:1932$ perl -e '$a="12\n23\n34";$a=~s/^/:/gm;print $a'
:12
:23
:34

To match with BOM you need to include it in the match.

~:1939$ perl -e '$a="12\n23\n34";$a=~s/^(\d)/<\1>:/mg;print $a'
12
<2>:3
<3>:4
~:1940$ perl -e '$a="12\n23\n34";$a=~s/^(?:)?(\d)/<\1>:/mg;print $a'
<1>:2
<2>:3
<3>:4
KennyTM
It's not working when I'm reading from a file. Could there be something hidden in my file?
JoelFan
@Joel: I think so. BOM maybe?
KennyTM
It's a Visual Studio file. I just reduced it to 1 character, and "dir" says the size is 4, so something is going on! Also, when I do "type", I see 3 strange characters at the beginning of the file
JoelFan
@Joel: They are probably `` (EF BB BF). Try to save the file as UTF-8 **without** byte order mark.
KennyTM
@KennyTM What if I can't change the files (there are many, and they need to be read back into Visual Studio, so I don't want to mess with that)?
JoelFan
There's a `File::BOM` on CPAN that will detect and ignore the BOM for you, which should make life easier. Also, the docs aren't wrong; the prose is just a little twisty. There's an implicit "also" in there :)
hobbs
@hobbs, thanks... that sounds better than using the regeex, I'll check it out
JoelFan
I guess somebody set me up the BOM
JoelFan
+1  A: 

Conceptually, there's assumed to be a newline before the beginning of the string. Consequently, /^a/ will find a letter 'a' at the beginning of a string.

Jonathan Leffler
+2  A: 

You can use the /^(?:\xEF\xBB\xBF)?/mg regex to match at the beginning of the line anyway, if you want to preserve the BOM.

eugene y
That worked! (but I still need more stuff in the comment)
JoelFan