tags:

views:

381

answers:

9

I am working on a C++ code base that was recently moved from X/Motif to Qt. I am trying to write a Perl script that will replace all occurrences of Boolean (from X) with bool. The script just does a simple replacement.

s/\bBoolean\b/bool/g

There are a few conditions.

1) We have CORBA in our code and \b matches CORBA::Boolean which should not be changed.
2) It should not match if it was found as a string (i.e. "Boolean")

Updated:

For #1, I used lookbehind

s/(?<!:)\bBoolean\b/bool/g;

For #2, I used lookahead.

s/(?<!:)\bBoolean\b(?!")/bool/g</pre>

This will most likely work for my situation but how about the following improvements?

3) Do not match if in the middle of a string (thanks nohat).
4) Do not match if in a comment. (// or /**/)

A: 

To fix condition 1 try:

s/[^:]\bBoolean\b(?!")/bool/g

The [^:] says to match any character other than ":".

John Meagher
+1  A: 
s/[^:]\bBoolean\b[^"]/bool/g

Edit: Rats, beaten again. +1 for beating me, good sir.

Daniel Jennings
+3  A: 

s/[^:]\bBoolean\b(?!")/bool/g

This does not match strings where Boolean is at that the beginning of the line becuase [^:] is "match a character that is not :".

KannoN
+2  A: 

Watch out with that quote-matching lookahead assertion. That'll only match if Boolean is the last part of a string, but not in the middle of the string. You'll need to match an even number of quote marks preceding the match if you want to be sure you're not in a string (assuming no multi-line strings and no escaped embedded quote marks).

nohat
A: 

3) Do not match if in the middle of a string (thanks nohat).

You can perhaps write a reg ex to check ".*Boolean.*". But what if you have quote(") inside the string? So, you have more work to not exclude (\") pattern.

4) Do not match if in a comment. (// or /* */)

For '//', you can have a regex to exclude //.* But, better could be to first put a regex to compare the whole line for the // comments ((.*)(//.*)) and then apply replacement only on $1 (first matching pattern).

For /* */, it is more complex as this is multiline pattern. One approach can be to first run whole of you code to match multiline comments and then take out only the parts not matching ... something like ... (.*)(/*.**/)(.*). But, the actual regex would be even more complex as you would have not one but more of multi-line comments.

Now, what if you have /* or */ inside // block? (I dont know why would you have it.. but Murphy's law says that you can have it). There is obviously some way out but my idea is to emphasize how bad-looking the regex will become.

My suggestion here would be to use some lexical tool for C++ and replace the token Boolean with bool. Your thoughts?

Jagmal
A: 

In order to avoid writing a full C parser in perl, you're trying to strike a balance. Depending on how much needs changing, I would be inclined to do something like a very restrictive s/// and then anything that still matches /Boolean/ gets written to an exception file for human decision making. That way you're not trying to parse the C middle strings, multi-line comment, conditional compiled out text, etc. that could be present.

piCookie
A: 
  1. Do not match if in the middle of a string (thanks nohat).
  2. Do not match if in a comment. (// or /**/)

No can do with a simple regex. For that, you need to actually look at every single character left-to-right and decide what kind of thing it is, at least well enough to tell apart comments from multi-line comments from strings from other stuff, and then you need to see if the “other stuff” part contains things you want to change.

Now, I don’t know the exact syntactical rules for comments and strings in C++ so the following is going to be imprecise and completely undebugged, but it’ll give you an idea of the complexity you’re up against.

my $line_comment      = qr! (?> // .* \n? ) !x;
my $multiline_comment = qr! (?> /\* [^*]* (?: \* (?: [^/*] [^*]* )? )* )* \*/ ) !x;
my $string            = qr! (?> " [^"\\]* (?: \\ . [^"\\]* )* " ) !x;
my $boolean_type      = qr! (?<!:) \b Boolean \b !x;

$code =~ s{ \G (
      $line_comment
    | $multiline_comment
    | $string
    | ( $boolean_type )
    | .
) }{
    defined $2 ? 'bool' : $1
}gex;

Please don’t ask me to explain this in all its intricacies, it would take me a day and another. Just buy and read Jeff Friedl’s Mastering Regular Expressions if you want to understand exactly what is going on here.

Aristotle Pagaltzis
A: 

The "'Boolean' in the middle of a string" part sounds a bit unlikely, I'd check first if there is any occurrence of it in the code with something like

m/"[^"]*Boolean[^"]*"/

And if there is none or a few, just ignore that case.

Victor
+1  A: 
#define Boolean bool

Let the preprocesser take care of this. Every time you see a Boolean you can either manually fix it or hope a regex doesn't make a mistake. Depending on how many macros you use you can you could dump the out of cpp.

nt