tags:

views:

28

answers:

2

So the USDA has some weird database of general nutrition facts about food, and well naturally we're going to steal it for use in our app. But anyhow the format of the lines is like the following:

~01001~^~0100~^~Butter, salted~^~BUTTER,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01002~^~0100~^~Butter, whipped, with salt~^~BUTTER,WHIPPED,WITH SALT~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01003~^~0100~^~Butter oil, anhydrous~^~BUTTER OIL,ANHYDROUS~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87
~01004~^~0100~^~Cheese, blue~^~CHEESE,BLUE~^~~^~~^~Y~^~~^0^~~^6.38^4.27^8.79^3.87

With those odd ~ and ^ separating the values, It also lacks a header row but thats ok, I can figure that out from the other stuff on their site: http://www.ars.usda.gov/Services/docs.htm?docid=8964

Any help would be great! If it matters we're making an open/free API with Ruby to query this data.

Additionally I'm having a tough time posing this question so I've made it a community wiki so we can all pitch in!

+1  A: 

^ appears to be a field delimiter and ~ a string delimiter. Normally I'd expect to see , and " in those roles, but the choice of the very uncommon characters means that a string like

Cheese, Bleu

won't get all trippy with the string parser.

Bob Kaufman
That's what I would guess too. Strings are surrounded by `~` on each end, but numbers aren't.
Greg Hewgill
+2  A: 

This looks like a very standard CSV (comma separated value) file, except the field separator character was changed from , to ^ and quote character from " to ~

Unfortunately, I'm not familiar with Ruby to recommend which library to use, but in Perl there's a boatload of standard CPAN modules the best of which allow you to configure both field separator and quote character of a CSV reader... I would expect Ruby should have something similar as well - if so, you're in luck!

DVK