tags:

views:

51

answers:

1

I want to extract a substring from my expression using regex.h library in C. Here is the code

#include <regex.h>
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
   regex_t    preg;   
   char       *string = "Random_ddName:cateof:Name_Random";

   char       *pattern = ".*Name:\\(.*\\):Name.*";
   int        rc;     
   size_t     nmatch = 1;
   regmatch_t pmatch[1];

   if (0 != (rc = regcomp(&preg, pattern, 0))) {
      printf("regcomp() failed, returning nonzero (%d)\n", rc);
      exit(EXIT_FAILURE);
   }

   if (0 != (rc = regexec(&preg, string, nmatch, pmatch, 0))) {
      printf("Failed to match '%s' with '%s',returning %d.\n",
      string, pattern, rc);
   }
   else {  
      printf("With the whole expression, "
             "a matched substring \"%.*s\" is found at position %d to %d.\n",
             pmatch[0].rm_eo - pmatch[0].rm_so, &string[pmatch[0].rm_so],
             pmatch[0].rm_so, pmatch[0].rm_eo - 1);
   }
   regfree(&preg);

    return 0;
}

I want to extract the string "cateof", but I want to be sure that is between the strings Name: and :Name. The cateof is random, it changes dynamically and this is the only part I need. How can I get it at once? Is it possible to use backreferences to store the value I need?

A: 

You must specify nmatch = 2, so that pmatch[0] contains the whole match and pmatch[1] the submatch you want.

Needed code changes:

size_t     nmatch = 2;
regmatch_t pmatch[2];

and

...
    pmatch[1].rm_eo - pmatch[1].rm_so, &string[pmatch[1].rm_so],
    pmatch[1].rm_so, pmatch[1].rm_eo - 1);
...
Vanni Totaro
how can I define the subexpression?
cateof
@cateof: you already defined the subexpression with `\\\(.*\\\)`. Try to apply my suggested changes and you will get as program output: `With the whole expression, a matched substring "cateof" is found at position 14 to 19.`
Vanni Totaro
@cateof: as you can read (under Linux) in `man 7 regex`: `The parentheses for nested subexpressions are "\\(" and "\\)"`. In C you must escape each \ with another \ (i.e. \\\).
Vanni Totaro
@cateof: explaining it better: `Obsolete ("basic") regular expressions differ in several respects. [omissis] The parentheses for nested subexpressions are "\\(" and "\\)"`. You are using POSIX Basic Regular Expression syntax because you are not passing `REG_EXTENDED` as third parameter to `regcomp`.
Vanni Totaro
what i don't really understand is why the first match is not adequate to get the substring. why we need nmatch=2? I don't want to matchthe Name:xxxx:Name. I want to match xxxx
cateof
@cateof: `pmatch[0]` always contains the part of your string that matches the pattern. You started and ended your pattern with `.*`, so your entire string is reported as matching the pattern. `pmatch[1]`, `pmatch[2]`, etc. contain all the matching marked subexpressions. See Wikipedia about [POSIX Basic Regular Expressions](http://en.wikipedia.org/wiki/Regular_expression#POSIX_Basic_Regular_Expressions). For you needs simply ignore `pmatch[0]` and use `pmatch[1]`.
Vanni Totaro
@cateof: if you liked my answer please upvote it! :)
Vanni Totaro