views:

53

answers:

3

I have a dense matrix where the indices correspond to genes. While gene identifiers are often integers, they are not contiguous integers. They could be strings instead, too.

I suppose I could use a boost sparse matrix of some sort with integer keys, and it wouldn't matter if they're contiguous. Or would this still occupy a great deal of space, particularly if some genes have identifiers that are nine digits?

Further, I am concerned that sparse storage is not appropriate, since this is an all-by-all matrix (there will be a distance in each and every cell, provided the gene exists).

I'm unlikely to need to perform any matrix operations (e.g., matrix multiplication). I will need to pull vectors out of the matrix (slices).

It seems like the best type of matrix would be keyed by a Boost unordered_map (a hash map), or perhaps even simply an STL map.

Am I looking at this the wrong way? Do I really need to roll my own? I thought I saw such a class somewhere before.

Thanks!

+2  A: 

You could use a std::map to map the gene identifiers to unique, consecutively assigned integers (every time you add a new gene identifier to the map, you can give it the map's size as its identifier, assuming you never remove genes from the map).

If you want to be able to search for the identifier of a gene based on its unique integer, you can use a second map or you could use a boost::bimap, which provides a bidirectional mapping of elements.

As for which matrix container to use, you might consider boost::ublas::matrix; it provides vector-like access to rows and columns of the matrix.

James McNellis
Give it the map's size as its identifier? Can you please clarify this?bimap looks interesting. I'd never seen that before. I wish boost had clearer abstracts for its types.
mohawkjohn
@mohawkjohn: If you want to assign unique, incrementing ids to the genes as you add them to the map (say, a `std::map<std::string, unsigned>`, where the string is the name of the gene and the int is the id you are assigning to it), you can do `m.insert(std::make_pair(str, m.size()))`. As you insert each element into the map, it will increase in size by one, so the generated ids will be unique.
James McNellis
+2  A: 

If you don't need matrix operations, you don't need a matrix. A 2D map with string keys can be done with map<map<string> > in plain C++, or using a hash map accordingly from Boost.

Eli Bendersky
A: 

There is Boost.MultiArray which will allow you to manage with non-continuous indexes.

If you want an efficient implementation working with matrices with static size, there is also Boost.LA, which in now on the review schedule.

And las there is also NT2 which should be submitted to Boost soon.

Vicente Botet Escriba
I had no idea there was a Boost MultiArray. Not sure if it'll be useful for this, but thanks for pointing it out. That should definitely be of utility to me at some point.
mohawkjohn
Hoping this helps. Just a question: Are the index strings know at compile time? If this is the case you can associate a unique and continuous integer at compile time and use MultiArray for your problem.template <const char* NAME> struct string_to_index;template <> struct string_to_index<"AAAA"> { const int value = 1;};template <> struct string_to_index<"BBB"> { const int value = 2;};...
Vicente Botet Escriba