Computing, storing, and retrieving values to and from an N-Dimensional matrix

This question is probably quite different from what you are used to reading here - I hope it can provide a fun challenge.

Essentially I have an algorithm that uses 5(or more) variables to compute a single value, called outcome. Now I have to implement this algorithm on an embedded device which has no memory limitations, but has very harsh processing constraints.

Because of this, I would like to run a calculation engine which computes outcome for, say, 20 different values of each variable and stores this information in a file. You may think of this as a 5(or more)-dimensional matrix or 5(or more)-dimensional array, each dimension being 20 entries long.

In any modern language, filling this array is as simple as having 5(or more) nested for loops. The tricky part is that I need to dump these values into a file that can then be placed onto the embedded device so that the device can use it as a lookup table.

The questions now, are:

What format(s) might be acceptable for storing the data?
What programs (MATLAB, C#, etc) might be best suited to compute the data?
C# must be used to import the data on the device - is this possible given your answer to #1?

Edit: Is it possible to read from my lookup table file without reading the entire file into memory? Can you explain how that might be done in C#?

I'll comment on 1 and 3 as well. It may be preferable to use a fixed width output file rather than a CSV. This may take up more or less space than a CSV, depending on the output numbers. However, it tends to work well for lookup tables, as figuring out where to look in a fixed width data file can be done without reading the entire file. This is usually important for a lookup table.

Fixed width data, as with CSV, is trivial to read and write. Some math-oriented languages might offer poor string and binary manipulation functionality, but it should be really easy to convert the data to fixed width during the import step regardless.

Number 2 is harder to answer, particularly without knowing what kind of algorithm you are computing. Matlab and similar programs tend to be great about certain types of computations and often have a lot of stuff built in to make it easier. That said, a lot of the math stuff that is built into such languages is available for other languages in the form of libraries.

Are you saying it is possible to read from my lookup table file without reading the entire file into memory? Can you explain how that might be done in C#?

Adam S 2010-06-01 21:57:05

`System.IO.FileStream.Seek` . Position will be equal to width * line + column position. `FileStream` also has a `Position` you can set directly, which is implemented in terms of `Seek`.

Brian 2010-06-01 22:54:33

As an aside, you may wish to read the entire file into memory anyhow, since it will be faster to do so if it fits. But fixed width will still make things cleaner and probably (CSV may have better cache performance if it is significantly smaller) faster.

Brian 2010-06-01 22:58:55

@Brian: If the text file is going to be used for lookups, then you're quite right about fixed-width being advantageous. For pure data transfer, CSV seems to be easier, not just because it's more compact but because the fields are delimited explicitly.

Steven Sudit 2010-06-02 18:10:46

@Steve: For data transfer, I consider the two formats to be mostly equal in terms of difficulty reading them. In the case of CSV you can format them by just using `String.Split`. In the case of fixed-width you can read the data directly into an array with no parsing at all, though of course this offloads the "reading" portion to some form of lookup function (width*line+column position = lookup position); this is still pretty easy to write. CSV is not necessarily more compact, especially as fixed width number data can be stored in binary, which is more compact than ascii and avoids parsing.

Brian 2010-06-02 18:48:01

@Brian: I was working on the assumption that whatever text the file contains would be converted to a binary format in memory, such as by using `Int32.Parse`. In this case, parsing is marginally simpler for CSV, but it's not so bad for fixed, either.

Steven Sudit 2010-06-02 21:01:56

@Steven: Good point. If you're going to read the whole thing in memory, it might not matter what format it is in, assuming transfer bandwidth is not a bottleneck.

Brian 2010-06-03 13:40:50

@Brian: If we want to avoid bringing it all into memory, then a reasonable solution would be to build an (n-dimensional?) index so that the program can jump directly to the region that contains a stripe. It'll still be faster to read sequentially, of course. The other idea would be to bulk-load it into a database and do the work in the stored proc as much as possible.

Steven Sudit 2010-06-03 20:08:02

ansaurus

tags:

views:

answers:

Computing, storing, and retrieving values to and from an N-Dimensional matrix

related questions