ansaurus

Question

Answer 1

+1 A:

I keep preaching BNF notation. If you can write down the grammar that defines your problem, you can easily convert it into a Boost.Spirit parser, which will do it for you.

TimeString := LongNotation | ShortNotation

LongNotation := Hours Minutes Seconds Fractions

Hours := digit digit
Minutes := digit digit
Seconds := digit digit
Fraction := digit digit

ShortNotation := ShortSeconds Fraction
ShortSeconds := digit

Edit: additional constraint

VerboseNotation = [ [ [ Hours ':' ] Minutes ':' ] Seconds ':' ]  Fraction

xtofl 2008-11-13 13:56:25

But that handles just this one `sff` case, not `ssff` or `mssff` or even 'f'. For now I'm padding the string and parsing it with spirit actually.

macbirdie 2008-11-13 14:07:57

You can require from the product requirements that they fully specify their needs in BNF form. If they can't, you can assist them, if they can, you're done.

xtofl 2008-11-13 16:08:32

Answer 2

A:

Regular Expressions come to mind. Something like "^0*?(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)$" with boost::regex. Submatches will provide you with the digit values. Shouldn't be difficult to adopt to your other format with colons between numbers (see sep61.myopenid.com's answer). boost::regex is among the fastest regex parsers out there.

Johannes Schaub - litb 2008-11-13 13:56:56

Answer 3

A:

In response to the comment "Don't mean to be a performance freak, but this solution involves some string copying (input is a const & std::string)".

If you really care about performance so much that you can't use a big old library like regex, won't risk a BNF parser, don't want to assume that std::string::substr will avoid a copy with allocation (and hence can't use STL string functions), and can't even copy the string chars into a buffer and left-pad with '0' characters:

void parse(const string &s) {
    string::const_iterator current = s.begin();
    int HH = 0;
    int mm = 0;
    int ss = 0;
    int ff = 0;
    switch(s.size()) {
        case 8:
            HH = (*(current++) - '0') * 10;
        case 7:
            HH += (*(current++) - '0');
        case 6:
            mm = (*(current++) - '0') * 10;
        // ... you get the idea.
        case 1:
            ff += (*current - '0');
        case 0: break;
        default: throw logic_error("invalid date");
        // except that this code goes so badly wrong if the input isn't
        // valid that there's not much point objecting to the length...
   }
}

But fundamentally, just 0-initialising those int variables is almost as much work as copying the string into a char buffer with padding, so I wouldn't expect to see any significant performance difference. I therefore don't actually recommend this solution in real life, just as an exercise in premature optimisation.

Steve Jessop 2008-11-13 14:20:26

I actually am using a BNF parser and std::string so performance is not THAT HUGE of an issue (spirit is mostly a template library after all) and I'm actually creating a replacement for a code similar to your example to make it more generic and, um, nice. ;)

macbirdie 2008-11-13 14:38:20

I simply wanted to see if there's a really clean solution possible.

macbirdie 2008-11-13 14:38:56

Fair enough - the question implied by that one comment is a whole other matter from the question you really asked, about simple code rather than horrible byte-fiddling. I'm just fond of that kind of switch hack, because it freaks out the squares who are scared of falling through ;-)

Steve Jessop 2008-11-13 14:50:09

ansaurus

tags:

views:

answers:

Reversed offset tokenizer

related questions