views:

187

answers:

3

I'm writing a screen-scraper for StackOverflow. The bit I'm writing now takes the HTML and puts all the information into a model object. I've run into a bit of bother while parsing the information from an answer.

The problem is the date format that StackOverflow uses to describe absolute times. DateTime.Parse doesn't work on them. I've tried fooling around with DateTime.ParseExact but I've had no success. Both throw a FormatException

Here's some background:

If you look at the source HTML for an answer, you get this:

<div id="answer-{id}" class="answer">
    <!-- ... -->
            answered <span title="2009-06-18 13:21:16Z UTC" class="relativetime">Jun 18 at 13:21</span>
    <!-- ... -->
</div>

Notice that the absolute time is stored in the span's title attribute. I've used the HTML Agility Pack from CodePlex to access the elements, and have extracted the value of the attribute.

Now I'm wondering how to get the "2009-06-18 13:21:16Z UTC" into a .NET DateTime object.

I'd like to be able to do this without Regexes, etc., but as the whole project is hackish and unstable, I don't really mind!

Finally, I can't use the data dump for these reasons:

  1. I can't use BitTorrent. Ever.
  2. If I could, the files are too big for my net connection.
  3. It's a bit out of date.
  4. It's not as fun!

Thanks.

+2  A: 

Well, you'd never use regex for this, but I think that format is just "u" described here: http://msdn.microsoft.com/en-us/library/az4se3k1.aspx

So ParseExact should accept that (with some minor work).

Noon Silk
Why is the UTC even there, though? Doesn't Z imply Zulu or UTC? Seems like it ought to be omitted.
tvanfosson
Just remove it. I don't know enough about timezones to comment on the difference between 'Zulu' and 'UTC' :)
Noon Silk
Yes, "Z" and "UTC" refer to the same thing.
Scott Dorman
+3  A: 

"Z" and "UTC" in the same DateTime string seems redundant.

If you remove "UTC" from the string, Parse works:

System.DateTime.Parse("2009-06-18 13:21:16Z")
{18.06.2009 15:21:16}
    Date: {18.06.2009 00:00:00}
    Day: 18
    DayOfWeek: Thursday
    DayOfYear: 169
    Hour: 15
    Kind: Local
    Millisecond: 0
    Minute: 21
    Month: 6
    Second: 16
    Ticks: 633809352760000000
    TimeOfDay: {15:21:16}
    Year: 2009
dtb
Thanks for that - it works perfectly. I'm kicking myself though ;)
Lucas Jones
A: 

I havn't found the magic to match the timezone (Z UTC) here, but assuming they're all UTC, this should get you started:

   string d = "2009-06-18 13:21:16Z UTC";
   CultureInfo provider = CultureInfo.InvariantCulture;
   string format  = "yyyy-MM-dd HH:mm:ssZ UTC";
   DateTime dt ;
   if(DateTime.TryParseExact(d,format,provider,DateTimeStyles.AssumeUniversal,out dt) {
          //use dt;
   } else {
       //bail out, error.
   }
nos