tags:

views:

97

answers:

4

I have this regex: ^\/\* to check and see if a file contains those two characters at the beginning. I'm iterating over many c++ source files trying to see which of them contain that. The problem is, that if a file contains this:

#include <source.h>

/* this is a comment */

this also matches the regex. I don't understand why, as the regex doesn't have the multiline flag on.

Here's the code for the regex:

multi = /^\/\*/

Why isn't this matching only at the beginning of the text? Here's basically everything I'm doing:

data = File.read(filename)
if data =~ multi
   puts "file starts with multiline header"
end
+4  A: 

In Ruby ^ matches after every newline. Use \A to match only at the start of the entire string:

multi = /\A\/\*/
Andomar
This kind of sux. Is this Ruby only behaviour, or do all the languages have this?
Geo
Most languages have a switch called "multiline mode" that causes them to behave like this, but Ruby is the only one I know that does it by default. Perl's multiline mode looks like `/^test/m`
Andomar
+2  A: 

Use \A (beginning of string) instead of ^ (beginning of line).

The interpretation of ^ is not completely consistent between flavors. Sometimes you need to set a mode modifier for multi-line strings, but not always. \A is consistent (although not available in all flavors, but most of them. Exceptions are XML, POSIX ERE/BREs and a few others).

Tim Pietzcker
Is this the same in Perl/Python?
Geo
In Perl/Python, behaviour of `^` depends on the mode modifiers used (`/m` in Perl or `re.MULTILINE` in Python).
Tim Pietzcker
A: 

I don't know of ruby internals, but try this:

/^[^a-zA-Z#<>]\/*/

The first part ensures that any valid character is not found before your multiline comment. Please, note that [^a-zA-Z#<>] is just an example, you should complete it with a valid combination.

clinisbut
+1  A: 

Why use a regular expression?

multi = "/*"
data = File.read(filename)
if data[0..2] == multi
   puts "file starts with multiline header"
end
Svante
Because additional spaces may be present before the comment start. The regex was just to find out why Ruby behaved like that.
Geo