views:

197

answers:

1

Hi

The requirement is to build a calculation engine which is performant and supports excel like formulas. These formulas need to be applied on huge data sets (millions of rows of data).

I was thinking if something could be built on top of OpenOffice Calc service and make it available as a Calculation Engine.

Does anyone have any experience in doing this ? Are there any other alternatives ? I know it is possible using Excel service but we are an Open Source shop. M$ is ruled out.

Any pointers would be very helpful.

Edited based on High Performance Mark's inputs.


  1. Numerics calculations are needed. Scientific calculations are not in scope (ie., Sin(x), tanh(x) etc)
  2. Calculation are not performed by end users. The formulas are stored in the DB and applied on the datasets. The formulas (like tax calculation) are configured. So if the formula changes, recalculation will be triggered via the application.
  3. spreadsheet like formulas are well understood by wider audience and should be easier to read and maintain. Is there any wrapper around R (or such equivalent) that will convert spreadsheet formula into R syntax ?
+1  A: 

Hi

Well, a little Googling finds several open-source Java-written spreadsheets, one of which may be suitable for your purposes. One of the questions you might want to answer, maybe edit your post, would be what calculations do you want to perform -- the full set of functionality that Excel provides (or something close) or would the facilities that SQL provides satisfy your requirements ? If so, then you might want to database this.

Another question you might clarify is this: are you trying to create an application which like Excel is usable by end-users for specifying calculations ? But, unlike Excel, is based on open-source software and can cope with millions of rows. I don't know about its performance on such large data sets, someone else on SO can probably tell us, but R is very popular (and rightly so) for what you are probably trying to do. My view is that R sits between the average programming languages (say Python) and the average spreadsheet (say Excel) in terms of ease-of-use-by-non-programmers.

Your choice of solution may (and certainly ought to) depend on who will be using it.

Regards

Mark

High Performance Mark
Thanks Mark. I've edited the post.
Sathya