views:

253

answers:

4

Hello:

I have an application in which I have to store a couple of millions of integers, I have to store them in a Look up table, obviously I cannot store such amount of data in memory and in my requirements I am very limited I have to store the data in an embebedded system so I am very limited in the space, so I would like to ask you about recommended methods that I can use for the reduction of the look up table. I cannot use function approximation such as neural networks, the values needs to be in a table. The range of the integers is not known at the moment. When I say integers I mean a 32 bit value.

Basically the idea is use some copmpression method to reduce the amount of memory but without losing many precision. This thing needs to run in hardware so the computation overhead cannot be very high.

In my algorithm I have to access to one value of the table do some operations with it and after update the value. In the end what I should have is a function which I pass an index to it and then I get a value, and after I have to use another function to write a value in the table.

I found one called tile coding http://www.cs.ualberta.ca/~sutton/book/8/node6.html, this one is based on several look up tables, does anyone know any other method?.

Thanks.

A: 

I need more detail on the problem. If you cannot store the real value of the integers but instead an approximation, that means you are going to reduce (throw away) some of the data (detail), correct? I think you are looking for a hash, which can be an artform in itself. For example say you have 32 bit values, one hash would be to take the 4 bytes and xor them together, this would result in a single 8 bit value, reducing your storage by a factor of 4 but also reducing the real value of original data. Typically you could/would go further and perhaps and only use a few of those 8 bits , say the lower 4 and reduce the value further.

I think my real problem is either you need the data or you dont, if you need the data you need to compress it or find more memory to store it. If you dont, then use a hash of some sort to reduce the number of bits until you reach the amount of memory you have for storage.

dwelch
Yes you are right I have to use some method to throw away some data but withot losing many precision. I have to use a compression method but that is fast in execution time because it needs to be executed in real time. Has the algorithm that you said based on XOR operations a name?.
Ryan
A: 

I'd look at the types of numbers you need to store and pull out the information that's common for many of them. For example, if they're tightly clustered, you can take the mean, store it, and store the offsets. The offsets will have fewer bits than the original numbers. Or, if they're more or less uniformly distributed, you can store the first number and then store the offset to the next number.

It would help to know what your key is to look up the numbers.

John at CashCommons
This is a RL application those values are the Q Values.
Ryan
A: 

Read http://www.cs.ualberta.ca/~sutton/RL-FAQ.html

"Function approximation" refers to the use of a parameterized functional form to represent the value function (and/or the policy), as opposed to a simple table."

Perhaps that applies. Also, update your question with additional facts -- don't merely answer in the comments.


Edit.

A bit array can easily store a bit for each of your millions of numbers. Let's say you have numbers in the range of 1 to 8 million. In a single megabyte of storage you can have a 1 bit for each number in your set and a 0 for each number not in your set.

If you have numbers in the range of 1 to 32 million, you'll require 4Mb of memory for a big table of all 32M distinct numbers.

See my answer to http://stackoverflow.com/questions/311202/modern-high-performance-bloom-filter-in-python#311360 for a Python implementation of a bit array of unlimited size.

S.Lott
Function approximation is a very common solution, and the normal solution in this problems, sadly I need to run this in an embebedded system so I cannot use the common function approximation due to the computation overhead.
Ryan
"computation overhead" usually false. If you don't have benchmark data for your function, you'll waste a lot of time on a lookup that turns out to be slower.
S.Lott
A: 

If you are merely looking for the presence of the number in question a bloom filter, might be what you are looking for. Honestly though your question is fairly vague and confusing. It would help to explain what Q values are, and what you do with them once you find them in the table.

grieve