views:

482

answers:

12

VBA is not cutting it for me anymore. i have lots of huge excel files to which i need to make lots of calculations and break them down into other excel/csv files.

i need a language that i can pick up within the next couple of days to do what i need because it is kind of an emergency. i have been suggested python, but i would like to check with you if there is anything else that does csv file handling quickly and easily.

+5  A: 

What kind of calculation you have to do? Maybe R would be an alternative?

EDIT: just to give a few basic examples

# Basic usage
data <- read.csv("myfile.csv")

# Pipe-separated values
data <- read.csv("myfile.csv", sep="|")

# File with header (columns will be named as header) 
data <- read.csv("myfile.csv", header=TRUE)

# Skip the first 5 lines of the file
data <- read.csv("myfile.csv", skip=5)

# Read only 100 lines
data <- read.csv("myfile.csv", nrows=100)
nico
nothing intensive, just need to be able to work with large files and take logs and also need to be able to create csv files fast
I__
@I__: Then I would say R is an excellent choice. Reading a csv file is just a `read.csv` call away...
nico
I've never used the language myself, but looking it over for 10 minutes, it does indeed look like it would make dealing with csv's easy.
T.E.D.
+4  A: 

Python definitely has a small learning curve, and works with csv files well

Zonda333
+1  A: 

Perl is surprisingly efficient for a scripting language for text. cpan.org has a tremendous number of modules for dealing with CSV data. I've also both written and wrote data in XLS format with another Perl module. If you were able to use VBA, you can certainly learn Perl (the basics of Perl are easy, though it's just as easy for you or others to write terse yet cryptic code).

bowenl2
+12  A: 

Python is an excellent choice. The csv module makes reading and writing CSV files easy (even Microsoft's, uh, "idiosyncratic" version) and Python syntax is a breeze to pick up.

I'd actually recommend against Perl, if you're coming to it fresh. While Perl is certainly powerful and fast, it's often cryptic to the point of incomprehensible to the uninitiated.

Chris B.
And here's a link: http://docs.python.org/library/csv.html
pr1001
http://docs.python.org/library/csv.html
Corey Goldberg
i appreciate this answer, however after asking this: http://stackoverflow.com/questions/3339694/python-csv-getting-subset i am not convinced python would be my first choice. R is probably better for me, but it will take too long to learn probably
I__
What's wrong with Python versus R? Alex's answer to your other question seems very simple and straight-forward.
pr1001
R may take a long time to learn if you want to do advanced programming, but for manipulating .csv files, it couldn't be easier. read.csv() and write.csv() are the built-in functions you need.
thebackhand
A: 

PHP has a couple of csv functions that are easy to use: http://www.php.net/manual-lookup.php?pattern=csv&amp;lang=en

Mchl
+1  A: 

That depends on what you want to do with the files.

Python's learning curve is less steep than R's. However, R has a bunch of built-in functions that make it very well suited for manipulating .csv files easily, particularly for statistical purposes.

Edit: I'd recommend R over Python for this purpose alone, if only because the basic operations (reading files, dropping rows, dropping columns, etc.) are slightly faster to write in R than in Python.

thebackhand
+4  A: 

There are many tools for the job, but yes, Python is perhaps the best these days. There is a special module for dealing with csv files. Check the official docs.

ssegvic
+1  A: 

PowerShell has CSV import built in.

The syntax is ugly as death, but it's designed to be useful for administrators more than for programmers -- so who knows, you might like it.

It's supposed to be a quick get-up-and-go language, for better and worse.

Rei Miyasaka
+3  A: 

You know VBA? Why not Visual Basic 2008 / 2010, or perhaps C#? I'm sure languages like python and ruby would be relatively easier for the job, but you're already accustomed to the ".NET way" of doing things, so it makes sense to keep working with them instead of learning a whole new thing just for this job.

Using C#:

var csvlines = File.ReadAllLines("file.csv");

var query = from csvline in csvlines
            let data = csvline.Split(',')
            select new
            {
                ID = data[0],
                FirstName = data[1],
                LastName = data[2],
                Email = data[3]
            };
  1. .NET: Linq to CSV library.
  2. .NET: Read CSV with LINQ
  3. Python: Read CSV file
Rafael Belliard
Is VBA really that similar to .NET? Last time I used it, I found that felt quite different.
Rei Miyasaka
typical CVS files aren't as simple to parse as split(',')
fuzzy lollipop
I strongly recommend not doing this. The problem with CSV is on the face of it its so simple. Everyone feels they should write their own parser. DON'T You'll get caught up with the quoting of items or similar. People write libaries which are shared by 100s of users. These are nearly always going to be more stable than something written yourself. Having said that I don't think VB is necessairly the wrong way to go about it if you know VBA. Be warned though VBA is much closer to VB Classic than VB.net
Wes
@DanM: **Tab-delimited is NOT a much simpler format** You don't appear to consider that fields can contain commas, tabs, newlines and quotes. Try it some time and see what you get.
John Machin
+2  A: 

I'd give awk a try. If you're running windows, you can get awk via the cygwin utilities.

Dan Breslau
awk is the right choice for CSVs!
dwo
Dang. Beat my answer by seconds. However, since it is the same answer, you are clearly a genius as well. +1 :-)
T.E.D.
Heh. Apparently someone doesn't agree; I have a downvote on this. But thanks :-)
Dan Breslau
It wasn't me with the -1 but do please tell us how awk handles fields containing quotes, commas, carriage returns, and line feeds.
John Machin
@John Machin: No, awk does not solve CSV in the general case. The OP indicated that a lot of "calculations" need to be done on the file, which led me to believe that the data is mostly numeric. One can also write pre-processors to handle various kinds of special characters in the input. That may sound like overkill, but remember that the OP was asking for a small learning curve, and awk does have that (IMHO.) Here's such a pre-processor (that I didn't write:) http://stackoverflow.com/questions/314384/parse-csv-file-using-gawk/322812#322812
Dan Breslau
@Dan Breslau: Ugh. That requires the caller to select as "repl" some character (default ~) that doesn't exist in the data. Yes, one can write such a pre-processor but it's a far cry from the supported and robust packages/modules that exist for languages like Python and Perl.
John Machin
+2  A: 

This may not be anybody's popular language du-jour, but since CSV files are line-oriented and split into fields, dealing with them is just about the perfect application for awk. It was built for processing line oriented text data that can be split into fields.

Most of the other languages folks are going to reccomend will be much more general-purpose, so there's going to be a lot more in them that isn't nessecarily applicable to processing line-oriented text data.

T.E.D.
Unless awk has been upgraded recently (last 10 years!), it doesn't handle the intricate parts of CSV format, like commas and/or quotes in a field e.g. an address field contains `"DunRomin", Highway 987, Walla Walla, WA` -- you'll need to write or copy code to do it.
John Machin
@John Machin - Yes, it is true that you have to do a bit of work to get awk to support fields with commas in them. You are absolutely right that is a drawback. If your data might have such things, that's something you have to take account of. However, none of my CSV work has required that yet. If you are in control of what generates the input too, it is typically easily avoided.
T.E.D.
+3  A: 

You say you have "excel files to which i need to make lots of calculations and break them down into other excel/csv files" but all the answers so far talk about csv only ...

Python has a csv read/write module as others have mentioned. There are also 3rd party modules xlrd (reads) and xlwt (writes) modules for XLS files. See the tutorial on this site.

John Machin
R has support for Excel files too
nico
@nico: The R docs say "The first piece of advice is to avoid doing so if possible!" [why?] and then go on to list a miscellany of 3rd party packages ... doesn't sound like "has support" to me :)
John Machin
@John Machin: doing what? Using Excel for stats? I totally agree with that (see for instance yalta.etu.edu.tr/yalta-excel2007-statistical.pdf www.oci.uzh.ch/efiles/CHE103/chartjunk_excel_CSDA.pdf www.pages.drexel.edu/~bdm25/excel2007.pdf ) :DAnyways, R is strongly community driven, the fact that there are several packages that allow you to open Excel files/manipulate them and even execute R code from within Excel counts as "has support" for me!
nico