views:

119

answers:

4

Scenario: I'm working on a rails app that will take data entry in the form of uploaded text-based files. I need to parse these files before importing the data. I can choose the file type uploaded to the app; the software (Microsoft Access) used by those uploading has several export options regarding file type.

While it may be insignificant, I was wondering if there is a specific file type that is most efficiently parsed. This question can be viewed as language-independent, I believe.

(While XML is commonly parsed, it is not a feasible file type for sake of this project.)

+2  A: 

You might want to take a look at JSON. It's a lightweight format, and in contrast to XML it's really easy and clean to parse without requiring a huge library on the backend.

It can represent types like strings, numbers, assosiative arrays (objects), and lists of such

LukeN
If I'm not mistaken, JSON isn't a format that Microsoft Access can export. I apologize for not mentioning that the files to be uploaded to my app are Access exports.
anxiety
Not your fault, I should have read ALL the tags :)
LukeN
A: 

I would suggest n-SV (where n is some character) for data that does not include n. That will make lexing the files a matter of a split.

If you have more flexible data, I would suggest JSON.

Paul Nathan
CSV (or n-SV) is very hard to parse yourself, since you have to account for including the delimiters themselves
JoelFan
I assume CSV would then be the best format to use given the conditions:1. The files uploaded to my app are ms-access exports2. I will be parsing in ruby
anxiety
@anxiety: you should review the condition that JoelFan brought up. If you have CSV and it has a string in it that has `..., "blah, foo",...`, you will have all sorts of *fun* parsing it. If you are accepting European numbers, commas will be found from time to time. Plus there is the 1,000,000 human-readable number format. My point is, "get a CSV engine if the data is complicated".
Paul Nathan
A: 

If you've HAVE to roll your own parser, I would suggest CSV or some form of a delimiter separated format.

If you are able to use other libraries, there are plenty of options. JSON looks quite fascinating.

Robb
CSV (or n-SV) is very hard to parse yourself, since you have to account for including the delimiters themselves
JoelFan
Hard, but doable. Here are Java based examples: [parseCsv](http://stackoverflow.com/questions/2241915/regarding-java-string-manipulation/2241950#2241950) and [writeCsv](http://stackoverflow.com/questions/477886/jsp-generating-excel-spreadsheet-xls-to-download/2154226#2154226).
BalusC
Really? I would think something pretty simple could be written up that probably wouldn't be to flexible but at least would solve his problems.
Robb
+2  A: 

If it is something exported by Access, the easiest would be CSV; particularly since Ruby contains a CSV parser in the standard library. You will have to do some work determining the dialect of CSV (what it uses for delimiter, how it handles quotes); I don't know how robust the ruby parser is with those issues, but you also should have some control from Microsoft Access.

Kathy Van Stone