tags:

views:

815

answers:

5

I don't think this is possible with just regular expressions, but I'm not an expert so i thought it was worth asking.

I'm trying to do a massive search and replace of C# code, using .NET regex. What I want to do is find a line of code where a specific function is called on a variable that is of type DateTime. e.g:

axRecord.set_Field("CreatedDate", m_createdDate);

and I would know that it's a DateTime variable b/c earlier in that code file would be the line:

DateTime m_createdDate;

but it seems that I can't use a named group in negative lookbehind like:

(?<=DateTime \k<1>.+?)axRecord.set_[^ ]+ (?<1>[^ )]+)

and if I try to match the all the text between the variable declaration and the function call like this:

DateTime (?<1>[^;]+).+?axRecord.set.+?\k<1>

it will find the first match - first based on first variable declared - but then it can't find any other matches, because the code is laid out like this:

DateTime m_First;
DateTime m_Second;
...
axRecord.set_Field("something", m_First);
axRecord.set_Field("somethingElse", m_Second);

and the first match encompasses the second variable declaration.

Is there a good way to do this with just regular expressions, or do I have to resort to scripting in my logic?

+1  A: 

This will be difficult to do with a single regex expression. However it is possible to do if you consider a processing the lines with a bit of state.

Note: I can't tell exactly what you're trying to match on the axRecord line so you'll likely need to adjust that regex appropriately.

void Process(List<string> lines) {
  var comp = StringComparer.Ordinal;
  var map = new Hashset<string>comp);
  var declRegex = new Regex("^\s(?<type>\w+)\s*(?<name>m_\w+)\s*";);
  var toReplaceRegex = new Regex("^\s*axRecord.set_(?<toReplace>.*(?<name>m_\w+).*)");

  for( var i = 0; i < lines.Length; i++) {
    var line = lines[i];
    var match = declRegex.Match(line);
    if ( match.Success ) {
      if ( comp.Equals(match.Groups["type"], "DateTime") ) {
        map.Add(comp.Groups["name"]);
      } else {
        map.Remove(comp.Groups["name"]);
      }
      continue;
    }

    match = toReplaceRegex.Match(line);
    if ( match.Success && map.Contains(match.Groups["name"]) ) {
      // Add your replace logic here
    }
}
JaredPar
This is a good solution, but for the next day or so, unless Jan Goyvaerts comes in and says it's not possible, I'm going to assume it is :)
LoveMeSomeCode
@LoveMeSomeCode, it's a borderline impossible problem. Consider this, if you want it to work 100% of the time in any C/C# code it's impossible with a regex. Regex's aren't as powerful as a parser. On the other hand if you want a solution for code in your particular project it may be possible
JaredPar
A: 

This cannot be done using regular expressions. For one thing, C#'s grammar is not regular; but more importantly, you're talking about analyzing expressions that are lexically unrelated. For this sort of thing, you're going to need full semantic analysis. That means lexer, parser, name binding and finally type checker. Once you have the annotated AST, you can look for the field you want and just read off the type.

I'm guessing this is a lot more work than you want to do though, seeing as it's about half of a full-blown C# compiler.

Daniel Spiewak
See my answer, you can piggy back on VS and get all this for free in the EnvDTE object
Binary Worrier
+5  A: 

Have a look at my answer to this question Get a methods contents from a C# file

It gives links to pages that show how to use the built in .net language parser to do this simply and reliably (i.e. not by asking "what looks like the usage I'm searching for", but by properly parsing the code with VS code parsing tools).

I know it's not a RegEx answer, but I don't think RegEx is the answer.

Binary Worrier
A: 

This is weird. I managed to build a regex that does find it, but it only matches the first one.

(?<=private datetime (?<1>\b\w+\b).+?)set_field[^;]+?\k<1>

so it seems like if I can't use a named group in a lookbehind, I can at least establish a named group in the lookbehind, and the use it in the match. But then it looks like when it matches just the function call (which is what I wanted) the caret position is moved to that line, and so it can't find any new matches because it's passed their declarations. or maybe I don't understand how the engine is really working.

i guess what I'm looking for is a regex option that tells it to look inside matches for more matches. which come to think of it, seems like that would be needed for basic html regex parsing too. you find a tag and then it's closing tag and the whole page is enclosed in that match, so you won't find any other other tags unless you recursively apply the pattern to each match.

anyone know anything about this or am i talking crazy?

LoveMeSomeCode
actually this pattern will match the one function for the variable that was declared last. i.e. it backtracks up to the declarations and the first one it finds makes the lookbehind true. removing the ? lazy operator from the lookbehind switches it to the first declared variable.
LoveMeSomeCode
A: 

Try this:

@"(?s)set_Field\(""[^""]*"",\s*(?<vname>\w+)(?<=\bDateTime\s+\k<vname>\b.+)"

By doing the lookbehind first, you're forcing the regex to search for the method calls in a particular order: the order in which the variables are declared. What you want to do is match a likely-looking method call first, then use the lookbehind to verify the type of the variable.

I just made a rough guess at the part that matches the method call. Like the others have said, whatever regex you use is going to have to be tailored to your code; there's no general solution.

Alan Moore
wow, thanks. I didn't even realize you could put the look behind after expression. just when I thought I was starting to understand regex. i guess i need to read up on this more. thanks again.
LoveMeSomeCode