views:

125

answers:

3

i have an excel spreadsheet that is about 300,000 rows and about 100 columns

i need to perform various functions on this spreadsheet and out of this spreadsheet i need to create about 3000 other spreadsheets which are SIGNIFICANTLY smaller

for every created spreadsheet i will need to have a separate powerpoint file that will have an automatically generated graph

i've done lots of VBA programming, but i am a little lost with this project

  1. if i dump the data into a mysql file would it be easier for me to handle my task?
  2. is it feasible to do this all in VBA excel?
  3. is it possible to easily add graphs from excel into powerpoint programmatically? or perhaps should i use a different solution for graphs?
+1  A: 

I can't answer 2. and 3. for you, but regarding 1: I'd definitely recommend against that, based on your question... of course, you didn't explain exactly what kind of operations you need to perform on the data, so chances are I'm wrong here.

Your situation reminds me of the saying about regexes: "Some people, when they encounter a problem, will immediately try to solve it using a regular expression. Now they have two problems". You don't want an additional problem.

If you must use a database to do this (simply because doing it in Excel isn't performant enough), I'd stick with something Microsoft like Access or SQL Server, which will save you some trouble probably. (never thought I'd be saying this)

kander
+2  A: 
  1. It depends strongly on how you plan to process the data. If you plan to write code in Excel, it makes much more sense to leave it in Excel. Having said that, I would dump the data to CSV (comma-delimited) for further processing with a different tool, like Python.

  2. Everything is always feasible given enough time and money. If you're like most other programmers, you don't have too much of either, so you want the most efficient solution, or close to it. If it were me, I would write code in Python to read the data from a CSV file, perform all required operations, and save the 3000 separate output sets as individual CSV files which can be imported back into Excel.

  3. Charts can be tricky to create and manipulate from VBA. I would use a Python library like Matplotlib to produce all graphical output, which would be saved to disk as PNG images, which can be inserted into the Powerpoint presentation(s).

Python is mentioned here only as an example. You should use a tool that you feel most familiar with; however, the concepts of processing the data programmatically (not via interconnected cell references and formulas with a little VBA thrown in to copy sheets and so on) should still apply, and will be your best way forward here. I have done a ton of the kind of work you describe. Get the data into CSV and process the data with code.

cjrh
+1 for CSV and python. I would also still recommend R as an alternative to Matplotlib (not necessarily better, just another option). It may sound like a pain to learn all these different steps, but it will probably make the process less daunting to think about, as it clearly divides the tasks into neat little pieces.
Wilduck
+2  A: 

Take a look at the open-source statistical system called "R". It's quite good at programatically generating graphs and charts from real-world datasets.

http://www.r-project.org/

Ollie Jones
this is not the right solution. you want me to learn a new platform to do this?
I__
Well, yes, honestly, I do. I would not have suggested this unless I believed that it would be easier and take less time to do this vast production run in R than it will in Excel / VBA. My reason is primarily for two reasons: 1) R handles vast data sets more smoothly, and 2) it gives better programmatic control over chart layout. Seriously. Don't use a chisel for a job needing a chainsaw.
Ollie Jones
what about matlab? how long does it take to learn R for my purposes?
I__
I learned a fair amount of R in the span of a few days. It has a great interactive environment that makes it easy to figure out the parts you need. http://cran.r-project.org/doc/manuals/R-intro.html is where I learned most of what I know, but mostly when I need to do some plotting, I just use http://www.phaget4.org/R/plot.html. The plots that R creates are _much_ nicer than excel, and look _great_ in powerpoints.
Wilduck