views:

223

answers:

4

Hello,

I am writing a simulation and need some hint on the design. The basic idea is that data for the given stochastic processes is being generated and later on consumed for various calculations. For example for 1 iteration:

  • Process 1 -> generates data for source 1: x1
  • Process 2 -> generates data for source 1: x2
  • and so on

Later I want to apply some transformations for example on the output of source 2, which results in x2a, x2b, x2c. So in the end up with the following vector: [x1, x2a, x2b, x2c].

I have a problem, as for N-multivariate stochastic processes (representing for example multiple correlated phenomenons) I have to generate N dimensional sample at once:

  • Process 1 -> generates data for source 1...N: x1...xN

I am thinking about the simple architecture that would allow to structuralize the simulation code and provide flexibility without hindering the performance.

I was thinking of something along these lines (pseudocode):

class random_process
{
    // concrete processes would generate and store last data
    virtual data_ptr operator()() const = 0;
};

class source_proxy
{
    container_type<process> processes;
    container_type<data_ptr> data; // pointers to the process data storage
    data operator[](size_type number) const { return *(data[number]);}
    void next() const {/* update the processes */}
};

Somehow I am not convinced about this design. For example, if I'd like to work with vectors of samples instead of single iteration, then above design should be changed (I could for example have the processes to fill the submatrices of the proxy-matrix passed to them with data, but again not sure if this is a good idea - if yes then it would also fit nicely the single iteration case). Any comments, suggestions and criticism are welcome.

EDIT:

Short summary of the text above to summarize the key points and clarify the situation:

  • random_processes contain the logic to generate some data. For example it can draw samples from multivariate random gaussian with the given means and correlation matrix. I can use for example Cholesky decomposition - and as a result I'll be getting a set of samples [x1 x2 ... xN]
  • I can have multiple random_processes, with different dimensionality and parameters
  • I want to do some transformations on individual elements generated by random_processes

Here is the dataflow diagram

random_processes                    output
   x1    --------------------------> x1
                               ----> x2a
p1 x2    ------------transform|----> x2b
                               ----> x2c
   x3    --------------------------> x3

p2 y1    ------------transform|----> y1a
                               ----> y1b

The output is being used to do some calculations.

+2  A: 

When I read this "the answer" doesn't materialize in my mind, but instead a question:

(This problem is part of a class of problems that various tool vendors in the market have created configurable solutions for.)

Do you "have to" write this or can you invest in tried and proven technology to make your life easier?

In my job at Microsoft I work with high performance computing vendors - several of which have math libraries. Folks at these companies would come much closer to understanding the question than I do. :)

Cheers, Greg Oliver [MSFT]

Greg Oliver
A: 

You may find this useful http://www.codeproject.com/KB/threads/lwsync.aspx

Mykola Golubyev
+1  A: 

I think that the second option (the one mentioned in the last paragraph) makes more sense. In the one you had presented you are playing with pointers and indirect access to random process data. The other one would store all the data (either vector or a matrix) in one place - the source_proxy object. The random processes objects are then called with a submatrix to populate as a parameter, and themselves they do not store any data. The proxy manages everything - from providing the source data (for any distinct source) to requesting new data from the generators.

So changing a bit your snippet we could end up with something like this:

class random_process
{
    // concrete processes would generate and store last data
    virtual void operator()(submatrix &) = 0;
};

class source_proxy
{
    container_type<random_process> processes;
    matrix data;
    data operator[](size_type source_number) const { return a column of data}
    void next() {/* get new data from the random processes */}
};

But I agree with the other comment (Greg) that it is a difficult problem, and depending on the final application may require heavy thinking. It's easy to go into the dead-end resulting in rewriting lots of code...

Anonymous
That's what I had in mind, but as I said - I'm not sure whether there are no traps hidden in it...
Aqq
+2  A: 

I'll take a stab at this, perhaps I'm missing something but it sounds like we have a list of processes 1...N that don't take any arguments and return a data_ptr. So why not store them in a vector (or array) if the number is known at compile time... and then structure them in whatever way makes sense. You can get really far with the stl and the built in containers (std::vector) function objects(std::tr1::function) and algorithms (std::transform)... you didn't say much about the higher level structure so I'm assuming a really silly naive one, but clearly you would build the data flow appropriately. It gets even easier if you have a compiler with support for C++0x lambdas because you can nest the transformations easier.

//compiled in the SO textbox...
#include <vector>
#include <functional>
#include <numerics>
typedef int data_ptr;

class Generator{
public:
    data_ptr operator()(){
      //randomly generate input
      return 42 * 4;
    }
};
class StochasticTransformation{
public:
    data_ptr operator()(data_ptr in){
       //apply a randomly seeded function
       return in * 4;
    }
};
public:
    data_ptr operator()(){
      return 42;
    }
};
int main(){

    //array of processes, wrap this in a class if you like but it sounds 
    //like there is a distinction between generators that create data
    //and transformations

    std::vector<std::tr1::function<data_ptr(void)> generators;

    //TODO: fill up the process vector with functors...
    generators.push_back(Generator());

    //transformations look like this (right?)
    std::vector<std::tr1::function<data_ptr(data_ptr)> transformations;

    //so let's add one 
    transformations.push_back(StochasticTransformation);

    //and we have an array of results...
    std::vector<data_ptr> results;

    //and we need some inputs
    for (int i = 0; i < NUMBER; ++i)
       results.push_back(generators[0]());

    //and now start transforming them using transform...
    //pick a random one or do them all...
    std::transform(results.begin(),results.end(),
                   results.begin(),results.end(),transformation[0]);
};
Rick
Thank you for the very nice idea. However there is still one issue bothering me. I didn't make it clear enough in the question, so I'll try to rephrase it.
Aqq
thanks for clarifying. It's been awhile since I looked at Cholesky decompositions/factorizations, but I'll revise my answer late tonight or tomorrow when I get a chance to hopefully be more relevant.
Rick