views:

206

answers:

3

I have a string to tokenize. It's form is HHmmssff where H, m, s, f are digits.

It's supposed to be tokenized into four 2-digit numbers, but I need it to also accept short-hand forms, like sff so it interprets it as 00000sff. I wanted to use boost::tokenizer's offset_separator but it seems to work only with positive offsets and I'd like to have it work sort of backwards.

Ok, one idea is to pad the string with zeroes from the left, but maybe the community comes up with something uber-smart. ;)

Edit: Additional requirements have just come into play.

The basic need for a smarter solution was to handle all cases, like f, ssff, mssff, etc. but also accept a more complete time notation, like HH:mm:ss:ff with its short-hand forms, e.g. s:ff or even s: (this one's supposed to be interpreted as s:00).

In the case where the string ends with : I can obviously pad it with two zeroes as well, then strip out all separators leaving just the digits and parse the resulting string with spirit.

But it seems like it would be a bit simpler if there was a way to make the offset tokenizer going back from the end of string (offsets -2, -4, -6, -8) and lexically cast the numbers to ints.

+1  A: 

I keep preaching BNF notation. If you can write down the grammar that defines your problem, you can easily convert it into a Boost.Spirit parser, which will do it for you.

TimeString := LongNotation | ShortNotation

LongNotation := Hours Minutes Seconds Fractions

Hours := digit digit
Minutes := digit digit
Seconds := digit digit
Fraction := digit digit

ShortNotation := ShortSeconds Fraction
ShortSeconds := digit

Edit: additional constraint

VerboseNotation = [ [ [ Hours ':' ] Minutes ':' ] Seconds ':' ]  Fraction
xtofl
But that handles just this one `sff` case, not `ssff` or `mssff` or even 'f'. For now I'm padding the string and parsing it with spirit actually.
macbirdie
You can require from the product requirements that they fully specify their needs in BNF form. If they can't, you can assist them, if they can, you're done.
xtofl
A: 

Regular Expressions come to mind. Something like "^0*?(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)$" with boost::regex. Submatches will provide you with the digit values. Shouldn't be difficult to adopt to your other format with colons between numbers (see sep61.myopenid.com's answer). boost::regex is among the fastest regex parsers out there.

Johannes Schaub - litb
A: 

In response to the comment "Don't mean to be a performance freak, but this solution involves some string copying (input is a const & std::string)".

If you really care about performance so much that you can't use a big old library like regex, won't risk a BNF parser, don't want to assume that std::string::substr will avoid a copy with allocation (and hence can't use STL string functions), and can't even copy the string chars into a buffer and left-pad with '0' characters:

void parse(const string &s) {
    string::const_iterator current = s.begin();
    int HH = 0;
    int mm = 0;
    int ss = 0;
    int ff = 0;
    switch(s.size()) {
        case 8:
            HH = (*(current++) - '0') * 10;
        case 7:
            HH += (*(current++) - '0');
        case 6:
            mm = (*(current++) - '0') * 10;
        // ... you get the idea.
        case 1:
            ff += (*current - '0');
        case 0: break;
        default: throw logic_error("invalid date");
        // except that this code goes so badly wrong if the input isn't
        // valid that there's not much point objecting to the length...
   }
}

But fundamentally, just 0-initialising those int variables is almost as much work as copying the string into a char buffer with padding, so I wouldn't expect to see any significant performance difference. I therefore don't actually recommend this solution in real life, just as an exercise in premature optimisation.

Steve Jessop
I actually am using a BNF parser and std::string so performance is not THAT HUGE of an issue (spirit is mostly a template library after all) and I'm actually creating a replacement for a code similar to your example to make it more generic and, um, nice. ;)
macbirdie
I simply wanted to see if there's a really clean solution possible.
macbirdie
Fair enough - the question implied by that one comment is a whole other matter from the question you really asked, about simple code rather than horrible byte-fiddling. I'm just fond of that kind of switch hack, because it freaks out the squares who are scared of falling through ;-)
Steve Jessop