tags:

views:

87

answers:

2

I have a string that look something like this:

long_str = "returns between paragraphs 20102/34.23\" - 9203 1232 \"test\" \"basic HTML\"";

Note: Quotes are part of the string.

int match(char *long_str){
    char * str;
    if ((str = strchr(long_str, '"')) != NULL) str++; // last " ?
    else return 1;
    return 0;
}

Using strstr I'm trying to get the whole substring between the last two quotes: "basic HTML". I'm just not quite sure what would be a good and efficient way of getting that match. I'm open to any other ideas on how to approach this. Thanks

A: 

Assuming that you can't modify the source string, then I think you should look at a combination of strchr() and strrchr() - at least, if you want to use library functions for everything.

  • First use strrchr() to find the last quote.
  • Then repeatedly use strchr() to find quotes from the front, up to the time it returns the last quote found by strrchr().
  • The combination of the previous result and the result of strrchr() gives you the quotes around the string you are interested in.

Alternatively:

  • First use strrchr() to find the last quote.
  • Write a loop to search backwards from there to find the previous quote, remembering not to search before the start of the string in the case that it is malformed and contains only one quote.

Both will work. Depending on the balance of probabilities, it is quite possible that the loop will outperform even a highly optimized strchr() wrapped in a loop.

Jonathan Leffler
Thanks for the info. I think I'll end up using the alternative as you mentioned!
David78
uber hack suggestion: do the first step as before. set that quote to a nul, use strrchr again!, don't forget to fix the quote you broke!
TokenMacGuy
@TokenMacGuy: one hopes that the data string is declared as `const char *` to discourage such abuses :D
Jonathan Leffler
+1  A: 
#include <stdio.h>

char * long_str = 
"returns between paragraphs 20102/34.23\" - 9203 1232 \"test\" \"basic HTML\"";

int main ()
{
    char * probe;
    char * first_quote = 0;
    char * second_quote = 0;
    for (probe = long_str; * probe; probe++) {
        if (*probe == '"') {
            if (first_quote) {
                if (second_quote) {
                    first_quote = second_quote;
                    second_quote = probe;
                } else {
                    second_quote = probe;
                }
            } else {
                first_quote = probe;
            }
        }
    }
    printf ("%s\n", first_quote);
    printf ("%d-%d\n", first_quote - long_str, second_quote - long_str);
    return 0;
}
Kinopiko
You can avoid some nesting in the conditions by inverting the conditions: `if (first_quote == 0) first_quote = probe; else if (second_quote == 0) second_quote = probe; else { first_quote = second_quote; second_quote = probe; }`. You could even use '`if (*probe != '"') continue; else if ...`, but I wouldn't normally bother.
Jonathan Leffler
I think it's OK as it is.
Kinopiko
+1 `probe` good name for the variable considering it's use *probing* through the string
TokenMacGuy
You shouldn't need a condition at all, `if (*probe == '"') { first_quote = second_quote; second_quote = probe;}` then make sure `first_quote != second_quote` when you've iterated over the whole string.
TokenMacGuy