tags:

views:

149

answers:

3

I have run into some code and was wondering what the original developer was up to. Per the norm I have simplified it down to the basic case before asking your assistance. The man page for scanf has relevant information. I am having some trouble reading it.

      #include <stdio.h>

      int main()  {     

      char title[80] = "mytitle";      
      char title2[80] = "mayataiatale";      
      char mystring[80]; 

      /* hugh ? */
      sscanf(title,"%[^a]",mystring);
      printf("%s\n",mystring); /* Output is "mytitle" */


      /* hugh ? */
      sscanf(title2,"%[^a]",mystring); /* Output is "m" */
      printf("%s\n",mystring);


      return 0;  
  }

I hoping for an anectodal usage and reasons code like this might be used. The code is part of a larger code generated application. I appreciate any feedback.

+2  A: 

It's like character sets from regular expressions; [0-9] matches a string of digits, [^aeiou] matches anything that isn't a lowercase vowel, etc.

There are all sorts of uses, such as pulling out numbers, identifiers, chunks of whitespace, etc.

MarkusQ
except it seems to quit after the first one... really poor mans' regexp in my opinion.
ojblass
"of course" it quit after the first one. Each pattern matches one thing, and stops when it can't match. Otherwise, "%d%d" might give one integer and an error instead of two...
RBerteig
+3  A: 

The constructs like %[a] and %[^a] exist so that scanf() can be used as a kind of lexical analyzer. These are sort of like %s, but instead of collecting a span of as many "stringy" characters as possible, they collect just a span of characters as described by the character class. There might be cases where writing %[a-zA-Z0-9] might make sense, but I'm not sure I see a compelling use case for complementary classes with scanf().

IMHO, scanf() is simply not the right tool for this job. Every time I've set out to use one of its more powerful features, I've ended up eventually ripping it out and implementing the capability in a different way. In some cases that meant using lex to write a real lexical analyzer, but usually doing line at a time I/O and breaking it coarsely into tokens with strtok() before doing value conversion was sufficient.

Edit: I ended ripping out scanf() typically because when faced with users insisting on providing incorrect input, it just isn't good at helping the program give good feedback about the problem, and having an assembler print "Error, terminated." as its sole helpful error message was not going over well with my user. (Me, in that case.)

RBerteig
+1  A: 

The main reason for the character classes is so that the %s notation stops at the first white space character, even if you specify field lengths, and you quite often don't want it to. In that case, the character class notation can be extremely helpful.

Consider this code to read a line of up to 10 characters, discarding any excess, but keeping spaces:

#include <ctype.h>
#include <stdio.h>

int main(void)
{
    char buffer[10+1] = "";
    int rc;
    while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0)
    {
            int c = getchar();
            printf("rc = %d\n", rc);
            if (rc >= 0)
                    printf("buffer = <<%s>>\n", buffer);
            buffer[0] = '\0';
    }
    printf("rc = %d\n", rc);
    return(0);
}

This was actually example code for a discussion on comp.lang.c.moderated (circa June 2004) related to getline() variants.


At least some confusion reigns. The first format specifier, %10[^\n], reads up to 10 non-newline characters and they are assigned to buffer, along with a trailing null. The second format specifier, %*[^\n] contains the assignment suppression character (*) and reads zero or more remaining non-newline characters from the input. When the scanf() function completes, the input is pointing at the next newline character. The body of the loop reads and prints that character, so that when the loop restarts, the input is looking at the start of the next line. The process then repeats. If the line is shorter than 10 characters, then those characters are copied to buffer, and the 'zero or more non-newlines' format processes zero non-newlines.

Jonathan Leffler
I am a little stuck at while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0)
ojblass
So you want to stop at newlines... what is going on in the second %?
ojblass