tags:

views:

2332

answers:

6

Is anyone aware of a standalone command line program that can be used to parse CSV files?

To do things like:

csvparse -c 2,5,6 filename

to extract fields from columns 2 5 and 6 from all rows.

It should be able to handle the csv file format: http://tools.ietf.org/html/rfc4180 which means quoting fields and escaping inner quotes as appropriate, so for an example row with 3 fields:

field1,"field, number ""2"", has inner quotes and a comma",field3

so that if I request field 2 for the row above I get:

field, number "2", has inner quotes and a comma

I appreciate that there are numerous solutions, perl, awk (etc..) to this problem but I would like a native bash command line tool that does not require me to invoke some other scripting environment or write any additional code (!).

+1  A: 

This sounds like a job for awk.

You will most likely need to write your own script for your specific needs, but this site has some dialogue about how to go about doing this.

You could also use the cut utility to strip the fields out.

Something like:

cut -f 2,5,6 -d , filename

where the -f argument is the field you want and -d is the delimeter you want. You could then sort these results, find the unique ones, or use any other bash utility. There is a cool video here about working with CSV files from the command line. Only about a minute, I'd take a look.

However, I guess you could group the cut utility with awk and not want to use it. I don't really know what exactly you mean by native bash command though, so I'll still suggest it.

samoz
I don't want to use Perl or awk. Apologies for not being specific enough in the question - I'll update it to reflect this.
Joel
This will become exponentially harder using just bash commands. You might want to check if you can use awk, because if you have bash, you most likely have awk already.
samoz
> "cut -f 2,5,6 -d , filename" - this will not work when the CSV fields contain commas and quotes, as per http://tools.ietf.org/html/rfc4180 which describes is much more than simply splitting on commas. Hence the -1
Joel
A: 

A quick google reveals an awk script that seems to handle csv files.

RobS
+6  A: 

My FOSS CSV stream editor CSVfix does exactly what you want. There is a binary installer for Windows, and a compilable version (via a makefile) for UNIX/Linux.

anon
This looks like exactly what I want. Will download it and try to get it running.
Joel
BTW - thanks for not answering a) "why not write one yourself?" b) "use awk/perl". If I had wanted to use either of those 2 options I wouldn't have bothered asking the question in the first place.
Joel
@Joel: The problem is the way you worded your question. You asked for a "bash command" when you should have said "standalone program". Your request has nothing at all to do with bash.
Dennis Williamson
yes, fair point
Joel
Joel
Trying to download the 0.95 source for Linux and getting 'file not found'? (The requested URL /files/csvfix_linsrc_95.tar.bz2?project=csvfix was not found on this server.)
Jonathan Leffler
@Jonathan Looks like a problem with Google Code.
anon
@Jonathan The windows binary installer downloads OK. If the source download continues not to work, I'll chase it up with Google tomorrow.
anon
Fair enough, Neil - the Windows source is also MIA as well.
Jonathan Leffler
@Jonathan The CSVfix downloads seem to be working again. This is the first glitch I've ever seen on Google Code, which is normally very reliable.
anon
@Neil: I'm having problems still - can we converse via email instead of SO comment? My email is available in my profile.
Jonathan Leffler
@Jonathan The best place to ask about problems is the CSVfix support forum at http://groups.google.com/group/csvfix - that way any problem/solution can be shared.
anon
A: 

I would bet $100 that there is no such csv-specific tool -- at least, not that comes pre-installed on a standard linux distro. Why don't you write it in C for yourself? I just wrote a CSV parser last week in a couple of hours. Yes, it handles quoted strings.

eeeeaaii
Care to put it up on the web somewhere for code review?
John Machin
Sorry, I didn't mean to come off sounding arrogant. I was just trying to encourage him to give coding it a try. Sometimes I have found that when I have a very limited and specific need it's easier to just write code for it than to search around for the just-right tool. I have had programmers include whole libraries of utilities into a project just to call one function -- IMO that's not efficient and it makes for a messy codebase.
eeeeaaii
as simple as CSV sounds, it is not a trivial problem to solve.
fuzzy lollipop
+1  A: 

My gut reaction would be to write a script wrapper around Python's csv module (if there isn't already such a thing).

Jeremy Cantrell
+1  A: 

Try crush-tools, they are great at manipulating delimited data. It sounds like exactly what you're looking for.

jmanning2k
thanks, nice tip.
Joel