tags:

views:

142

answers:

4

Hello, I need some help trying to match a C include file with full path like so:

#include <stdio.h>  -> stdio.h
#include "monkey/chicken.h" -> monkey/chicken.h

So far I have (adapted from another expression I found):

^\s*\#include\s+(["'<])([^"'<>/\|\b]+)*([">])

But, I'm kind of stuck at this point - it doesn't match in the second case, and I'm not sure how to return the result of the match, eg the file path back to regcomp().

BTW I've looked at regexplib.com, but can't find anything suitable.

Edit: Yes I am a total regexp newbie, using POSIX regex with regmatch_t and friends...

+1  A: 

You can try this regex:

(^\s*\#\s*include\s*<([^<>]+)>)|(^\s*\#\s*include\s*"([^"]+)")

I prefer to have seperate regex for
#include <>
and
#include ""

Nick D
+4  A: 

This would give better results:

^\s*\#include\s+["<]([^">]+)*[">]

You then want to look at the first capture group when you get a match.

You don't say what language you're using, the factor you mention regcomp() leads me to believe that you're using POSIX regex library in C. If that's right, then you want to use the regexec function and use the nmatch and pmatch parameters to get the first capture group.

Laurence Gonsalves
Yep POSIX regex it is, I will update question.
Justicle
In theory, you could have `#include <name"this>` or `#include "name>this"` -- once upon a long time ago, the second might have appeared in C for PRIMOS. In practice, neither is likely.
Jonathan Leffler
Jonathan: yes, it occurred to me that this isn't based strictly on the standard, but I figured that filenames that contain either double quotes or greater than signs rarely, if ever show up in the wild (and C source/header files seem to fall in the "less weird crap" end of the file naming spectrum).
Laurence Gonsalves
+4  A: 

Here's what I wrote :

#include ((<[^>]+>)|("[^"]+"))

Does it fit ?

Clement Herreman
Yep, nice and simple as well. I'll probably tweak to be more robust with spaces. Thanks!
Justicle
Suggest '(<[^>]+>)' to better identify the <notation> and similar for the other term. Otherwise `#include <stdio.h> // a > b` gets the wrong information. Can you use non-capturing parentheses too? That depends on the regex library.
Jonathan Leffler
@Jonathan of course I could use non-capturing parentheses... What is this ? btw, I edited the regex, ty =)
Clement Herreman
+1  A: 

Not particularly well tested, but it matches your two cases:

^\s*#include\s+(<([^"'<>|\b]+)>|"([^"'<>|\b]+)")

The only problem is that due to the < and > thing, the result could be in capture group 2 or 3, so you should check if 2 is empty, then use 3... The advantage over some of the other answers is that it won't match sth like this: #include "bad.h> or this: #include <bad<<h>

And here's an example how to use (wrap) regcomp & friends:

 static bool regexMatch(const std::string& sRegEx, const std::string& sSubject, std::vector<std::string> *vCaptureGroups)
 {
  regex_t re;
  int flags = REG_EXTENDED | REG_ICASE;
  int status;

  if(!vCaptureGroups) flags |= REG_NOSUB;

  if(regcomp(&re, sRegEx.c_str(), flags) != 0)
  {
   return false;
  }

  if(vCaptureGroups)
  {
   int mlen = re.re_nsub + 1;
   regmatch_t *rawMatches = new regmatch_t[mlen];

   status = regexec(&re, sSubject.c_str(), mlen, rawMatches, 0);

   vCaptureGroups->clear();
   vCaptureGroups->reserve(mlen);

   if(status == 0)
   {
    for(size_t i = 0; i < mlen; i++)
    {
     vCaptureGroups->push_back(sSubject.substr(rawMatches[i].rm_so, rawMatches[i].rm_eo - rawMatches[i].rm_so - 1));
    }
   }

   delete[] rawMatches;
  }
  else
  {
   status = regexec(&re, sSubject.c_str(), 0, NULL, 0);
  }

  regfree(&re);

  return (status == 0);
 }
KiNgMaR
Hey ! I don't match `#include "bad.h>` ! :(
Clement Herreman
a) Sorry, didn't see your answer.b) Yours will e.g. match #include <test<<h>c) Also just noted mine will also match 'file.h'. D'oh. Sorry 'about that. Stupid PHP. So a good idea would be to combine Clement's for the general idea and mine for validation.
KiNgMaR
Haha thanks I don't need to match just plain silly includes.
Justicle
You're right, but I'm usually against using regex for too precise validation. For something like that, I would just get the string of the .h, the use any standard function to check if it is a valid filepath.
Clement Herreman