I'm writing a screen-scraper for StackOverflow. The bit I'm writing now takes the HTML and puts all the information into a model object. I've run into a bit of bother while parsing the information from an answer.
The problem is the date format that StackOverflow uses to describe absolute times. DateTime.Parse
doesn't work on them. I've tried fooling around with DateTime.ParseExact
but I've had no success. Both throw a FormatException
Here's some background:
If you look at the source HTML for an answer, you get this:
<div id="answer-{id}" class="answer">
<!-- ... -->
answered <span title="2009-06-18 13:21:16Z UTC" class="relativetime">Jun 18 at 13:21</span>
<!-- ... -->
</div>
Notice that the absolute time is stored in the span's title attribute. I've used the HTML Agility Pack from CodePlex to access the elements, and have extracted the value of the attribute.
Now I'm wondering how to get the "2009-06-18 13:21:16Z UTC"
into a .NET DateTime
object.
I'd like to be able to do this without Regexes, etc., but as the whole project is hackish and unstable, I don't really mind!
Finally, I can't use the data dump for these reasons:
- I can't use BitTorrent. Ever.
- If I could, the files are too big for my net connection.
- It's a bit out of date.
- It's not as fun!
Thanks.