views:

1793

answers:

6

Are there any good programs for dealing with reading large CSV files? Some of the datafiles I deal with are in the 1 GB range. They have too many lines for Excel to even deal with. Using Access can be a little slow, as you have to actually import them into a database to work with them directly. Is there a program that can open large CSV files and give you a simple spreadsheet layout to help you easily and quickly scan through the data?

+1  A: 

vEdit is great for this. I routinely open up 100+ meg (i know you said up to one gig, I think they advertise on their site it can handle twice that) files with it. It has regex support and loads of other features. 70 dollars is cheap for the amount you can do with it.

Kevin
A: 

vEdit is great but don't forget you can always go back to "basics" check out Cygwin and start greping.

Helpfull commands

  • grep
  • head
  • tail
  • of course perl!
jms
+1  A: 

GVim can handle files that large for free if you are not attached to a true spreadsheet static field size view.

EBGreen
A: 

It depends on what you actually want to do with the data. Given a large text file like that you typically only want a smaller subset of the data at any one time, so don't overlook tools like 'grep' for pulling out the pieces you want to look for and work with.

Andrew
A: 

If you can fit the data into memory and you like python then I recommend checking out the UniTable portion of Augustus. (Disclaimer: Augustus is open source (GPLv2) but I work for the company that writes it.)

It's not very well documented but this should help you get going.

from augustus.kernel.unitable import *
a = UniTable().from_csv_file('filename')
b = a.subtbl(a['key'] == some_value) #creates a subtable

It won't directly give you an excel like interface but with a little bit of work you can get many statistics out quickly.

David Locke
+5  A: 

MySQL can import CSV files very quickly onto tables using the LOAD DATA INFILE command. It can also read from CSV files directly, bypassing any import procedures, by using the CSV storage engine.

Importing it onto native tables with LOAD DATA INFILE has a start up cost, but after that you can INSERT/UPDATE much faster, as well as index fields. Using the CSV storage engine is almost instantaneous at first, but only sequential scan will be fast.

Update: This article (scroll down to the section titled Instant Data Loads) talks about using both approaches to loading CSV data onto MySQL, and gives examples.

Jordi Bunster
i did work with Real Estate MLS datasets that consisted of 15-30MB CSV file's. Without MySQL LOAD INFILE, each feed would have taken a hour or more to process.... but using MySQL and raw tables I cut processing down to 5-6 minutes for even the larger data sets.
David