views:

182

answers:

5

I was wondering if somebody could explain me how pointers and string parsing works. I know that I can do something like the following in a loop but I still don't follow very well how it works.

  for (a = str;  * a;  a++) ...

For instance, I'm trying to get the last integer from the string. if I have a string as const char *str = "some string here 100 2000";

Using the method above, how could I parse it and get the last integer of the string (2000), knowing that the last integer (2000) may vary.

Thanks

A: 

The loop you've presented just goes through all characters (string is a pointer to the array of 1-byte chars that ends with 0). For parsing you should use sscanf or better C++'s string and string stream.

mbq
+6  A: 

for (a = str; * a; a++) ...

This works by starting a pointer a at the beginning of the string, until dereferencing a is implicitly converted to false, incrementing a at each step.

Basically, you'll walk the array until you get to the NUL terminator that's at the end of your string (\0) because the NUL terminator implicitly converts to false - other characters do not.

Using the method above, how could I parse it and get the last integer of the string (2000), knowing that the last integer (2000) may vary.

You're going to want to look for the last space before the \0, then you're going to want to call a function to convert the remaining characters to an integer. See strtol.

Consider this approach:

  • find the end of the string (using that loop)
  • search backwards for a space.
  • use that to call strtol.

-

for (a = str; *a; a++);  // Find the end.
while (*a != ' ') a--;   // Move back to the space.
a++;  // Move one past the space.
int result = strtol(a, NULL, 10);

Or alternatively, just keep track of the start of the last token:

const char* start = str;
for (a = str; *a; a++) {     // Until you hit the end of the string.
  if (*a == ' ') start = a;  // New token, reassign start.
}
int result = strtol(start, NULL, 10);

This version has the benefit of not requiring a space in the string.

Stephen
This code is broken if the string contains no spaces. It will loop past the beginning of the string, possibly into invalid addresses, in which case it will crash.
R..
if the str is correctly null terminated, `a = str + strlen(str)` points to the byte past the last byte of the string (the null byte); almost the same as the `for` loop but more readable, I think; moreover instead of `*a != ' '` you can use isspace
ShinTakezou
@R.. : That's true, but given how the question was phrased, I think it's a safe assumption.
Stephen
@ShinTakezou : Also true :) I considered using `strlen`, but the OP said "using the above method", so I did... OTOH, `isspace` probably would be more clear.
Stephen
@R.. : Thanks for the idea, I've added a version that doesn't require a space.
Stephen
@Stephen: Take a look at http://stackoverflow.com/questions/3127722/pointers-and-string-parsing-in-c/3129035#3129035 which uses `strrchr()` rather than rolling your own loop
SiegeX
+3  A: 
  for (a = str;  * a;  a++)...

is equivalent to

  a=str;
  while(*a!='\0') //'\0' is NUL, don't confuse it with NULL which is a macro
  {
      ....
      a++;
  }
Prasoon Saurav
+3  A: 

You just need to implement a simple state machine with two states, e.g

#include <ctype.h>

int num = 0; // the final int value will be contained here
int state = 0; // state == 0 == not parsing int, state == 1 == parsing int

for (i = 0; i < strlen(s); ++i)
{
    if (state == 0) // if currently in state 0, i.e. not parsing int
    {
        if (isdigit(s[i])) // if we just found the first digit character of an int
        {
            num = s[i] - '0'; // discard any old int value and start accumulating new value
            state = 1; // we are now in state 1
        }
        // otherwise do nothing and remain in state 0
    }
    else // currently in state 1, i.e. parsing int
    {
        if (isdigit(s[i])) // if this is another digit character
        {
            num = num * 10 + s[i] - '0'; // continue accumulating int
            // remain in state 1...
        }
        else // no longer parsing int
        {
            state = 0; // return to state 0
        }
    }
}
Paul R
Yuck :) This deserves 3 lines of code, with one parse instead of analyzing each character.
Stephen
This is an inefficient method; it parses all strings and tosses away all but the last. And you should call strlen() only once and save that in a temp var, instead of calling it on every iteration, as this code does (the compiler *may* optimize this for you if the string is a `const char *`).
Tim Schaeffer
@Tim/@Stephen: are you familiar with the term *premature optimisation* ? The above code was written for clarity and to illustrate the concept of states in a parser (even though in this case there are only two states) - the OP is a noob and needs to understand the basic concepts, not worry about micro-optimisation or writing the tersest possible code.
Paul R
non-C90 style comments. :(
BobbyShaftoe
@BobbyShaftoe: who said anything abut requiring C89/C90 compatibility ? C99 has been around for over 10 years now, and most C compilers have supported C++/C99-style comments for much longer than that. What is the problem ? And does it really warrant a down-vote ???
Paul R
@Paul R : I'm quite familiar with the term. :) My "yuck" wasn't really referring to the inefficiencies, it was about the complexity of your solution. Look how many branches there are - that's where bugs usually lie, look how many comments you have to explain it! It's a clever solution, but wayyyyy overkill for this problem.
Stephen
@Paul R : By the way, putting `strlen` in the for-loop turns an O(n) problem into O(n^2). That's hardly premature optimization. (Of course, this depends on how clever the compiler optimizes...) I didn't vote you down - clearly it works... I'd just prefer something simpler.
Stephen
@Stephen: no need to second-guess the compiler - gcc is smart enough to cache the result of strlen, as should any halfway-decent compiler.
Paul R
@Paul R, I did not downvote. I never downvote unless it is really bad. It just made me sad to see C99 being further promoted. :) I don't care for C99 and it is hardly fully supported, even if the comment style is. I prefer strict ANSI C, although I have to give up on that in certain situations (kernel programming). Also, I'm not sure gcc caches the result of strlen, any documentation on this?
BobbyShaftoe
@BobbyShaftoe: 3 downvotes now - all from anonymous curmudgeons apparently - never mind. I'm not sure why you're so opposed to C99, but the // comments predate it anyway - even people who write C89 with gcc tend to use // comments (i.e. not strict ANSI). As for caching the result of strlen - I checked this by looking at the output from gcc -S.
Paul R
Well there's an upvote. I think its an ok answer. Interesting on the caching.
BobbyShaftoe
+2  A: 

I know this has been answered already but all the answers thus far are recreating code that is available in the Standard C Library. Here is what I would use by taking advantage of strrchr()

#include <string.h>
#include <stdio.h>

int main(void)
{

    const char* input = "some string here 100 2000";
    char* p;
    long l = 0;

    if(p = strrchr(input, ' '))
        l = strtol(p+1, NULL, 10);

    printf("%ld\n", l);

    return 0;
}

Output

2000
SiegeX