tags:

views:

165

answers:

7

I have been following this convention thus far:

std::string line;
while(std::getline(in,line)) 
{
    if(line.size() && line[0] =='#')
      continue;
    /* parse text*/
}

The obvious drawback is that comment may not begin at the first character, in the case of leading whitespace.

What is the good way to deal with this sort of a thing?

+2  A: 

You should make sure to check the string length before testing character zero:

if (line.length() > 0 && line[0] == '#')
James McNellis
Adam Wright
isn't line.empty() better? Meyers recommends empty() as opposed to length(). BTW, I do this check in the actual code
vehomzzz
If you do this check, why wouldn't you put it in your original post?
GMan
empty() is better if you are only testing the first character, yes.
James McNellis
+10  A: 

Simple enhancement: you may want to use line.find_first_not_of(" ") to get the first non-whitespace and then check if that is a '#'. That would also cover to the zero length case. Something like this fragment:

  found= line.find_first_not_of(" \t");

  if( found != string::npos)
  {
    if( line[found] == '#')
      continue;
  }

More info

Stuart
What about tabs?
vehomzzz
added \t for tabs
Stuart
Add those too. `" \t"`
GMan
This seem to be a portable solution. Someone suggested using in>>std::wc to skip white spaces. While it is visually appealing, it didn't work in my environment.
vehomzzz
Those are not the only white space characters. There are a couple more in ASCII and if your input is UTF-8 there are a whole bunch more. You should use the standard white space skipping mechanism to make sure it works in all environments.
Martin York
A: 

You might like the Boost String Library, specifically trim_left and starts_with.

Adam Wright
good idea, but I can NOT use boost!
vehomzzz
Why on earth not? C++ without boost is C++ how it should be (and in many areas, will be shortly with 0x).
Adam Wright
Er, make that "is" and "isn't" :)
Adam Wright
Introducing Boost can be an immense challenge, verging on impossible in a corporate environment. It's no different to using any third party library whether it will end up in 0x or not. It needs installing and maintaining across team(s) and could take months to get approval... I love and use Boost but I wouldn't install Boost just to solve this.
Stuart
@Adam Wright you must have not worked with a legacy code or in a corp. that mandates certain rules and regulations. I wish I could use boost :) though
vehomzzz
@Stuart alas, exactly how I feel!
vehomzzz
@Stuart: No, boost is unlike many other libraries. With only some very few exceptions, boost is all headers. This means that you can usually just check it in besides your project and be done.
sbi
A: 

Parsing is tricky and difficult.

In general, I would not recommend trying to parse without a state machine. For example, what if the '#' is part of a multiline ("""...""" in python) ?

There are libraries that exist which may simplify parsing (well, they are supposed to, but understanding them might prove challenging if you have no prior inkling), for example, in C++, one can only recommend Spirit.

There are already been some pointers suggested to help you using string methods, though they only related to detecting if the first meaningful character is a '#'.

If you do not 'fear' multiline (that is if what you are trying to parse does not have such a feature), you will still have to manage 'simple' lines, which can be done by counting, taking escapes into account:

print "My \"#\" is: ", phoneNumber # This is my phone number

If you parse this line badly, you'll end up with an error... (for example)

If you cannot use a library, a state machine is the way to go, writing a parser is quite fun in general, it gives you insights as to why the notation has been developed in a certain way.

Matthieu M.
No need to go crazy, my requirements are much simpler!
vehomzzz
Most languages do not support multi-line strings.
Martin York
@Martin: true, though they might have streaming comments in exchange. I was just trying to attract the attention of Andrei so that he would provide more context on the file format.
Matthieu M.
+1  A: 

From the sound of things, your file format specifies that everything from '#' to the end of a line is a comment. If that's the case, you can find the beginning of the comment with:

// Warning: untested code.
int pos = line.find('#');

Then, you presumably want to ignore the rest of the line, most easily managed by deleting it:

if (pos != std::string::npos)
    line.erase(pos, -1);

This should deal quite easily with things like:

tax = rate * price    # figure tax on item

Of course, this assumes that a '#' always signals the beginning of a comment -- if you allow '#' inside of characters strings, or for whatever other purpose, you'll need to take that into account (but it's hard to guess what that would be since you've told us very little about the file format).

Jerry Coffin
+1  A: 

Use the stream's facility to skip whitespace, std::ws:

inline std::istream& get_line(std::istream& in, std::string& line)
{
    in >> std::ws;
    std::getline(in,line);
    return in;
}

std::string line;
while(get_line(in,line)) 
{
    if(!line.empty() && line[0] =='#')
        continue;
    /* parse text*/
}
sbi
:((( doesn't work
vehomzzz
This seems to be GCC-specific, though it would be nice for other compilers/library vecdors as well.
MP24
Could you elaborate on what "doesn't work" means and why you think it's "GCC-specific"? I pasted the above into a VS test project, compiled it, and checked it with a few test lines. It seems to work just fine.
sbi
+2  A: 

Use the operator >>. It ignores whitespace.

std::string line;
while(std::getline(in,line))
{
    std::stringstream linestr(line);
    char              firstNoWhiteSpaceChar;

    linestr >> firstNoWhiteSpaceChar;
    if ((!linestr) || (firstNoWhiteSpaceChar == '#')) 
    {
        // If line contains only white space then linestr will become invalid.
        // As the equivalent of EOF is set. This is the same as a comment so
        // we can ignore the line like a comment.
        continue;
    }

    // Do Stuff with line.
}
Martin York
interesting solution
vehomzzz
It's the only one that works in all situations.
Martin York