views:

69

answers:

2

Hello, I have a Boost.MultiIndex big array about 10Gb. In order to reduce the reading I thought there should be a way to keep the data in the memory and another client programs will be able to read and analyse it.

What is the proper way to organize it?

The array looks like:

    struct particleID
    {
    int           ID;// real ID for particle from Gadget2 file "ID" block
    unsigned int  IDf;// postition in the file
    particleID(int id,const unsigned int idf):ID(id),IDf(idf){}
    bool operator<(const particleID& p)const { return ID<p.ID;}
    unsigned int getByGID()const {return (ID&0x0FFF);};

    };

struct ID{};
struct IDf{};
struct IDg{};

typedef multi_index_container<
    particleID,
    indexed_by<
        ordered_unique<
            tag<IDf>,  BOOST_MULTI_INDEX_MEMBER(particleID,unsigned int,IDf)>,
        ordered_non_unique<
            tag<ID>,BOOST_MULTI_INDEX_MEMBER(particleID,int,ID)>,
        ordered_non_unique<
            tag<IDg>,BOOST_MULTI_INDEX_CONST_MEM_FUN(particleID,unsigned int,getByGID)> 
    >
> particlesID_set;

Any ideas are welcome.

kind regards Arman.

EDIT: The RAM and the number of cores are not limited. Currently I have a 16Gb and 8cores.

Update

The same question I was asking in Boost.Users forum I got an answer from Joaquín M López Muñoz(developer of Boost.MultiIndex). The aswer is Yes. One can share the multi_index between processes using Boost.Interprocess. For more detail you can see in this link

+3  A: 

Have you looked at Boost.Interprocess?

TheJuice
Thanks, this is a probably the way how to do it. Could you please a post an example how to share the object with Boost.Interporcess?thanks.
Arman
It will require writing a custom allocator, though the library may propose it directly (Haven't looked at it for a while). Beware though that whatever the strategy you use (multi threads or multi processes), you will HAVE to synchronize your accesses.
Matthieu M.
Why I should care about access reading? Different processes can read from multindex in parallel.Is not it?
Arman
+2  A: 

Have you thought about cutting it into pieces.

Concurrent access is hard. Hard to get right, hard to maintain, hard to reason about.

On the other hand, 10GB is very big, and I wondered if you could cluster your data. Keep the same index structure but dispatch it into 10 (or more) independent objects depending on some conditions (the big id for example).

This way you could naturally process each chunk separately from the other without having to deal with concurrent access in the first place.

Matthieu M.
The data is contains 2000 files with data, I am reading them in to memory, it is a time series data, the multiindex is well suited to traverse it. Why do you think the 10GB is big? I have a 16Gb Ram. I was wondered is it possible to share the pointer of the array just to another process? The another process is already multi threaded, It is running over the different IDs. I would like to get rid of the reading part which takes a most of the time.
Arman
I may have been unclear. There is nothing wrong with occupying 10GB of RAM. It's just that given that you have so much data, it could be easier to treat them if you could parallelize the work, and it's easier to parallelize if you can cut the data to work on in chunks rather than implementing a synchronization mechanism. You say you have 8 cores, so would not it be great if you had 8 chunks, each being processed independently of the others, so that the 8 cores are crunching data instead of just 1 ? That would be faster for sure :)
Matthieu M.
Oh, Yes, you are right!That approach is the fastest. I have a parallel code which uses Boost.Threads to read the data and several analysis tools(I would say modules). Currently the bottleneck is the reading, I would like to keep my data always in the memory and analyse with many threads.
Arman