views:

146

answers:

4

I'm kinda stuck with parsing of date/time strings. Help would be greatly appreciated.
Input: strings with date and optional time. Different representations would be nice but necessary. The strings are user-supplied and can be malformed. Examples:


 - "2004-03-21 12:45:33" (I consider this the default layout)
 - "2004/03/21 12:45:33" (optional layout)
 - "23.09.2004 04:12:21" (german format, optional)
 - "2003-02-11" (time may be missing)

Needed Output: Seconds since Epoch (1970/01/01 00:00:00) or some other fixed point.

Bonus: Also, reading the UTC-offset of the local system time would be great.

The input is assumed to be a local time on the machine in question. The output needs to be UTC. System is Linux only (Debian Lenny and Ubuntu needed).

I have tried to use boost/date_time, but must admit I can't wrap my head around the documentation. The following works without the needed conversion from system local time to UTC:

std::string date = "2000-01-01";
boost::posix_time::ptime ptimedate = boost::posix_time::time_from_string(date);
ptimedate += boost::posix_time::hours(Hardcoded_UTC_Offset);// where to get from?
struct tm = boost::posix_time::to_tm(ptimedate);
int64_t ticks = mktime(&mTmTime);

I think boost::date_time can provide the needed UTC offset, but I wouldn't know how.

A: 

the simplest, portable solution is to use scanf:

int year, month, day, hour, minute, second = 0;
int r = 0;

r = scanf ("%d-%d-%d %d:%d:%d", &year, &month, &day,
           &hour, &minute, &second);
if (r == 6) 
{
  printf ("%d-%d-%d %d:%d:%d\n", year, month, day, hour, minute,
          second);
}
else 
{
    r = scanf ("%d/%d/%d %d:%d:%d", &year, &month, &day,
           &hour, &minute, &second);
    // and so on ...

Initialize a struct tm with the int values and pass it to mktime to get a calendar time as time_t. For timezone conversions, please see information on gmtime.

Vijay Mathew
@Vijay - C runtime `scanf/printf` introduce buffer management and type safety problems that can be avoided by use of appropriate C++ libraries.
Steve Townsend
This doesn't solve the local to utc problem. Also, the string is user-supplied and can be invalid/malformed. I think this can be a problem with scanf?
Gabriel Schreiber
@Gabriel if the string is malformed, scanf will not return 6. about utc, i have added some more information to the answer.
Vijay Mathew
@Steve could you please point out buffer management/type safety problems that can arise in this particular situation and their c++ alternatives? that will be informative.
Vijay Mathew
(x)printf family all behave unpredictably if format string mismatches inputs in count or type. Why would anybody use printf/scanf in C++? If the q was tagged C this is no issue.
Steve Townsend
A: 

boost::gregorian has some of the stuff you need without you doing any more work:

using namespace boost::gregorian;
{
  // The following date is in ISO 8601 extended format (CCYY-MM-DD)
  std::string s("2000-01-01");
  date d(from_simple_string(s));
  std::cout << to_simple_string(d) << std::endl;
}

There is an example on how to use UTC offsets with boost::posix_time here.

You can provide generation of date and time from custom input string formats using date_input_facet and time_input_facet. There is an I/O tutorial on this page that should help you get going.

Steve Townsend
Thx for the facet/tutorial linx. Using boost::gregorian does not solve the problem because it doesn't provide time parsing/representation.
Gabriel Schreiber
@Gabriel - right, you would have to build your own parser and formatter using these tools to handle all your required cases. Unless you have unbounded possible input formats, that should be possible using a parser for each format, and a wrapper that identifies the format type and feeds to the appropriate parser.
Steve Townsend
@Gabriel - note that when I say parser, this is really nothing complex given your input `string` options. Just detect each and build the appropriate Boost constructs to parse appropriately into a date_time.
Steve Townsend
+1  A: 

If c-style is acceptable: strptime() is the way to go, because you can specify the format and it can take locale in account:

tm brokenTime;
strptime(str.c_str(), "%Y-%m-%d %T", &brokenTime);
time_t sinceEpoch = timegm(brokenTime);

Different layouts will have to be checked with the return value (if possible). Timezone will have to be added to by checking the system clock (localtime_r() with time(), tm_zone)

stefaanv
strptime has been tried. It is not acceptable because it will happily crash if the string is not well-formed.
Gabriel Schreiber
I use it, it doesn't crash here, but experience may differ. I will have to investigate (to google) to be sure...
stefaanv
@Gabriel: Except for MacOs X Leopard, where strptime seems to be broken, nothing special was found (getdate crashes, Qalculate removed strptime (2004)). Could you give some information about the system on which it crashes?
stefaanv
+3  A: 

Although I don't know how to format a single-digit month input in boost, I can do it after the two-digit edit:

#include <iostream>
#include <boost/date_time.hpp>
namespace bt = boost::posix_time;
const std::locale formats[] = {
std::locale(std::locale::classic(),new bt::time_input_facet("%Y-%m-%d %H:%M:%S")),
std::locale(std::locale::classic(),new bt::time_input_facet("%Y/%m/%d %H:%M:%S")),
std::locale(std::locale::classic(),new bt::time_input_facet("%d.%m.%Y %H:%M:%S")),
std::locale(std::locale::classic(),new bt::time_input_facet("%Y-%m-%d"))};
const size_t formats_n = sizeof(formats)/sizeof(formats[0]);

std::time_t pt_to_time_t(const bt::ptime& pt)
{
    bt::ptime timet_start(boost::gregorian::date(1970,1,1));
    bt::time_duration diff = pt - timet_start;
    return diff.ticks()/bt::time_duration::rep_type::ticks_per_second;

}
void seconds_from_epoch(const std::string& s)
{
    bt::ptime pt;
    for(size_t i=0; i<formats_n; ++i)
    {
        std::istringstream is(s);
        is.imbue(formats[i]);
        is >> pt;
        if(pt != bt::ptime()) break;
    }
    std::cout << " ptime is " << pt << '\n';
    std::cout << " seconds from epoch are " << pt_to_time_t(pt) << '\n';
}
int main()
{
    seconds_from_epoch("2004-03-21 12:45:33");
    seconds_from_epoch("2004/03/21 12:45:33");
    seconds_from_epoch("23.09.2004 04:12:21");
    seconds_from_epoch("2003-02-11");
}

note that the seconds-from-epoch output will be assuming the date was in UTC:

~ $ ./test | head -2
ptime is 2004-Mar-21 12:45:33
seconds from epoch are 1079873133
~ $ date -d @1079873133
Sun Mar 21 07:45:33 EST 2004

You could probably use boost::posix_time::c_time::localtime() from #include <boost/date_time/c_time.hpp> to get this conversion done assuming the input is in the current time zone, but it is rather inconsistent: for me, for example, the result will be different between today and next month, when daylight saving ends.

Cubbi
Showing how to work with facets is much appreciated. Using localtime is not an option if I understand it right since it would give me the DST-offset of today rather than of the given date.
Gabriel Schreiber
@Gabriel Schreiber: You could probably do the DST-offset on the given date, by doing the opposite to what `utc_to_local()` does in `/usr/include/boost/date_time/c_local_time_adjustor.hpp`, which would still use the current computer's zone. A better way is probably something closer to http://www.boost.org/doc/libs/1_44_0/doc/html/date_time/examples.html#date_time.examples.seconds_since_epoch
Cubbi