views:

277

answers:

1

I'm trying to figure out how to get the most recent latitude and longitude of a Twitter user (from the new Geo API data, ie the <geo:point> tag, you can see how they look like on my twitter user timeline xml feed). I also need to retrieve how old that data is (in seconds) from the <created_at> tag.

I'm trying to write this in C to use with an mbed microcontroller so I can't use any big libraries (ideally I wouldn't use any libraries, but that might be a bad idea). The mbed site suggests a few light libraries - YAJL and FastXML seem useful - but my C knowledge is very basic and I'm unsure as to how to proceed.

Assuming I have the code for retrieving a twitter user timeline into memory as a string and/or to disk (as either JSON or XML) how should I proceed?

At the moment I'm doing this scraping on my webserver via PHP, but I'd rather have it done in C as I hope to release the code when I'm done (and I don't want my poor server being rammed!) The PHP looks like this:

<?php
date_default_timezone_set('UTC');
try {
  $tweets = json_decode(file_get_contents("http://twitter.com/statuses/user_timeline.json?screen_name=".urlencode($_GET['screenname'])));
  foreach($tweets as $tweet) {
    if (is_array($tweet->geo->coordinates)) {
      echo date("U") - strtotime($tweet->created_at);
      echo ",{$tweet->geo->coordinates[0]},{$tweet->geo->coordinates[1]}";
      break;
    }
  }
} catch (Exception $e) {
  exit();
}

This works fairly well, but I have no idea how to turn this into C! Any ideas?

Here's a snippet of the XML I'm expecting to deal with:

<statuses type="array">
 <status>
  <created_at>Sat Dec 12 22:25:17 +0000 2009</created_at>
  <id>6611101548</id>
  <text>Hello stackoverflow! This tweet is geotagged.</text>
  <other tags/>
  <geo>
   <georss:point>52.946972 -1.182846</georss:point>
  </geo>
 </status>
 <status ...>
</statuses>

(btw, the mbed is awesome, I'm having an amazing time with it despite my lack of advanced knowledge in C or electronics, they're in stock at Farnell for £32 and definitely worth the money!)

+2  A: 

Assuming you have all of the feed in memory, I would write a very crude, and simple, parser.

First, I'd write a high level tokenizer. This tokenizer would return two types of tokens: XML Tags and Other.

So, if you had as a XML source:

<tag arg="stuff">
    <tag2>data</tag2>
</tag>

That would return "<tag arg="stuff">" as the first token, "
    " (note newline) in the second token, "<tag2>" in the third, "data" in the forth.

Something like this:

char *p = bufPtr;
char *start = p;
char *token;
char target;

if (*p == '<') {
    // found the start of a tag, lets look for the end
    target = '>';
} else {
    // not in a tag, so we'll search for one
    target = '<';
}
p++;
while (*p != target) {
    p++;
}
int length = p - start;
result = malloc(length + 1);
memcpy(result, start, length);
*(token + length) = '\0'; // terminate result string
bufPtr = p; // advance for the next token

(caveat, my C is rusty, there may well be some one off errors in here, but the gist is good.)

Now that I'm getting these meta chunks of the XML, it's straightforward.

I just scan tokens until I see one that starts with your geo tag. Once you see this, you "know" the next token is your lat/long data. Grab that, parse it (perhaps with sscanf), to get your values.

What this does is effectively flatten you XML space. You don't really care how deep the tag is, and you really don't care it's well formed, or anything. You're pretty much assuming it's well formed and conforming.

Of the top of my head, I don't know if XML allows the < or > chars within a quoted tag attribute, but even if it does allow it, odds are good that this SPECIFIC XML does not, so it'll work. Otherwise you'll need to parse quoted stuff (not that much harder, but...).

Is this robust? Hell no. Very GIGO sensitive. But a simple check to make sure you don't run off the buffers end should save you there.

Will Hartung
Thanks! This will be excellent at finding the first geo tag, but how then do I scan the current `status` to get the `create_at` tag information? (And then how do I parse that string into a number of seconds until now?)
JP
How many "created_at" tags do you expect in the payload? Look for the status tag, set a flag, and look for the create_at tag. I don't know if there's a standard C lib to read a time string or not, otherwise you can probably use sscanf to read that, populate a struct _tm and use the C lib time/date functions.
Will Hartung