ansaurus

Question

Simple, fast SQL queries for flat files.

Answer 1

A:

We'll I have a lightweight ORM for sqlite that would simplify this task without requiring any configuration files, etc.

If you can using PowerShell has a lot of powerful capabilities for parsing and querying text files (example here). Otherwise using .NET/Mono you can cut that up in and use LINQ in no time.

mythz 2010-02-17 02:40:54

I'd like to be able to do everything right from the shell, which this solution doesn't seem to support.

plinehan 2010-04-01 18:19:07

Which one? Powershell is bash on steroids which lets you do everything from the shell. As for OrmLite (which is what I would use), you write a few lines program that imports all the data to your db of choice then you can use the sqlite3.exe to query from the command prompt.

mythz 2010-04-05 08:50:54

Answer 2

+1 A:

Perl DBI using DBD::AnyData

harschware 2010-02-17 02:59:17

Answer 3

A:

you can use sqlite. Here's an example using Python.

import sqlite3
conn = sqlite3.connect('/tmp/test.db')
cursor = conn.cursor()
try:
    cursor.execute("""create table table1 (word varchar not null, number varchar not null)""")
except: pass
cursor.execute("insert into table1 values ('dog', '15')")
cursor.execute("insert into table1 values ('cat', '20')")
cursor.execute("insert into table1 values ('dog', '10')")
cursor.execute("select max(number) , word from table1 group by word")
print cursor.fetchall()

output

$ ./python.py
[(u'20', u'cat'), (u'15', u'dog')]

ghostdog74 2010-02-17 04:03:31

I'd like to be able to do everything right from the shell, which this solution doesn't seem to support.

plinehan 2010-04-01 18:19:31

Answer 4

+1 A:

I just stumbled across this Python script which does something like what you want, although it only supports very basic queries.

David Johnstone 2010-02-17 04:41:30

Answer 5

A:

I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniqs "-f" option, which I had been unaware of:

cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1

The awk portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.

I'm still holding out hope for a SQL-like tool, though.

plinehan 2010-02-27 02:37:31

Answer 6

+1 A:

I wrote TxtSushi mostly to do SQL selects on flat files. Here is the command chain for your example (all of these commands are from TxtSushi):

tabtocsv animals.txt | namecolumns - | tssql -table animals - \
'select col1, max(as_int(col2)) from animals group by col1'

namecolumns is only required because animals.txt doesn't have a header row. You can get a quick sense of what is possible by looking through the example scripts. There are also links to similar tools on the bottom of the main page.

Keith 2010-04-01 00:47:35

Very nice. How well does it scale? I'm hoping to deal with muti-gigabyte files which exceed the available RAM on my machine.

plinehan 2010-04-01 17:26:15

It does any kind of row filtering or column selection using a streaming approach but as soon as you ask it to do anything requiring a sort (group by, join on, order by all require sorts) it wants to read the full table into memory. In this case you can give the -external-sort option which tells TxtSushi to sort on disk instead, but my current implementation of external sort is very inefficient and needs some work.

Keith 2010-04-01 22:51:55

ansaurus

tags:

views:

answers:

Simple, fast SQL queries for flat files.

related questions