views:

570

answers:

11

Is there some kind of tool which lets me do SQL like queries (counting, aggregating, joining,etc) without using a full fledged database?

Preferably it's some kind of commandline tool:

sqlcommandline "select count(*) from file1.csv where bladebla"
+2  A: 

Not SQL, but take a look at my own OSS project CSVfix, which does some of what you want.

anon
looks very interesting... can one combine files?and is there a documentation/overview page of some sorts?
Toad
you can join two csb files on specific fields. One of the downloads is the full manual in HTML format. No overview page as yet.
anon
great tool Just read the documentation. I do have some tips and ideas. I'll email you those via your tool site.
Toad
if you or anyone else else has ideas/comments on csvfix, please post in the support forum - address on the main site page - cheers!
anon
A: 

AutoMate (you can have trial version), but it seems as an excess to me using automation tool for such simple task.

Max Gontar
+5  A: 

You can import CSV into sqlite

Nerdling
+1  A: 

Partly it depends on what exactly you want to be able to do -- do you have some specific problems you'd like to solve? This kind of thing I tend to address using awk / grep, and if it gets complicated I'd write a script in Ruby or Python.

On the other hand, I had almost exactly the requirement you have recently and I solved it using MySQL. I also tried SQLite, but it was too slow (basically because I needed to run thousands of separate queries). SQLite is a good option, particularly if you're working with a language like Python, but I understand it works best if you can do one large query and then process the results in code, which might not really meet your needs.

Ben
A: 

If you have the sql server tools installed, you might be able to use the included command line query tool (osql.exe) and treat the file as an ole data source.

I expect you want something a little simpler to deploy, though, and you didn't even mention if you were on windows.

Joel Coehoorn
I'm on server 2003But I'm really looking for some simple commandline thingy since it automates the easiest.
Toad
A: 

You don't say what OS you're on, which would help. I suspect from your stated command line preference that you may be on *nix, though.

If you're on Windows, you could investigate using the Microsoft Text Driver to create a data source targeted on a directory. Then each CSV can be treated as a table.

Mike Woodhouse
i googled for 'microsoft test driver' but couldn't find any reference...do you have a link?
Toad
Ah, nuts. I meant "Windows Text Driver". It should be visible as one of the options when you create a Data Source.
Mike Woodhouse
+1  A: 

If you're not afraid of Perl, DBI and DBD::CSV.

http://search.cpan.org/~jzucker/DBD-CSV-0.22/lib/DBD/CSV.pm

However, whether this handles aggregate functions is unknown to me. It is best to import it into a real database first (SQLite as has been mentioned is a good candidate)

Frakkle
A: 

I've stumbled upon pig and hadoop. It looks like it could do the job. I'll investigate this some more too.

Toad
http://hadoop.apache.org/pig/hadoop makes it possible to handle gigantic files (using a virtual fielsystem).pig can run on (or without it) hadoop, and gives the user some kind of sql like syntax to do queries on huge files(mostly csv)at least this is what I got from it
Toad
+1  A: 

The Jet Database Engine is installed with Windows, or can be downloaded http://support.microsoft.com/kb/239114/en-us. You can use it with script to query a number of file types.

Dim cn: Set cn = CreateObject("ADODB.Connection")
Dim rs: Set rs = CreateObject("ADODB.Recordset")

cn.Open _
    "Provider = Microsoft.Jet.OLEDB.4.0; " & _
    "Data Source =c:\docs\;" _
    & "Extended Properties=""text;HDR=Yes;FMT=Delimited"";"

rs.Open "SELECT * FROM Test.csv", cn

a = rs.GetString
MsgBox a
Remou
+2  A: 

Log Parser

cindi
wow! Exactly what I needed! Thanks!
Toad
the only thing lacking is a join. but the csvfix (mentioned elsewhere) is capable of this.
Toad
A: 

There is a Groovy script, gcsvsql, which does exactly what you are asking for, including joins. You can do things like

gcsvsql "select name,age from /users/data/people.csv where age > 40"

gcsvsql "select sum(score) from people.csv where age < 40"

gcsvsql " select people.name,children.child from people.csv,children.csv where people.name=children.name and people.age < 40"

You can get it from Google code here:

http://code.google.com/p/gcsvsql/

Kolmogorov