tags:

views:

244

answers:

2

Hello,

I am using PCRE for some regex parsing and I need to search a string for words in a specific pattern (let's say all words in a string of words separated by commas) and put them into a string vector.

How would I go about doing that?

+1  A: 

Sorry for the rough code, but I am in a hurry...

  pcre* re;
  const char *error;
  int   erroffset;
  char* subject = txt;
  int   ovector[3];
  int   subject_length = strlen(subject);
  int rc = 0;


  re = pcre_compile(
  "\\w+",              /* the pattern */
  PCRE_CASELESS|PCRE_MULTILINE,                    /* default options */
  &error,               /* for error message */
  &erroffset,           /* for error offset */
  NULL);                /* use default character tables */

  char* pofs = subject;
  while (  rc >= 0  ) {
    rc = pcre_exec(
      re,                   /* the compiled pattern */
      NULL,                 /* no extra data - we didn't study the pattern */
      subject,              /* the subject string */
      subject_length,       /* the length of the subject */
      0,                    /* start at offset 0 in the subject */
      0,                    /* default options */
      ovector,              /* output vector for substring information */
      3);           /* number of elements in the output vector */

    /*
    if (rc < 0) {
      switch(rc) {
        case PCRE_ERROR_NOMATCH: printf("No match\n"); break;

        // Handle other special cases if you like

        default: printf("Matching error %d\n", rc); break;
      }
      pcre_free(re);     // Release memory used for the compiled pattern
      return;
    }
    */

    /* Match succeded */

    if (  rc >= 0  ) {
      pofs += ovector[1];

      char *substring_start = subject + ovector[0];

      // do something with the substring

      int substring_length = ovector[1] - ovector[0];

      subject = pofs;
      subject_length -= ovector[1];
    }
  }
Nick D
+1  A: 
sc
Have you tested this? I would expect that to match the commas as if they were part of the words: ["w1," "w2," "w3"]
Alan Moore