I'm working on Scala with VERY larg lists of Int (maybe large) and I need to compress them and to hold it in memory.
The only requirement is that I can pull (and decompress) the first number on the list to work with, whithout touching the rest of the list.
I have many good ideas but most of them translate the numbers to bits. Example:
you can write any number x as the tuple |log(x)|,x-|log(x)| the first element we right it as a string of 1's and a 0 at the end (Unary Code) and the second in binary. e.g:
1 -> 0,1 -> 0 1
...
5 -> 2,1 -> 110 01
...
8 -> 3,0 -> 1110 000
9 -> 3,1 -> 1110 001
...
While a Int takes a fixed 32 bits of memory and a long 64, with this compression x requires 2log(x) bits for storage and can grow indefinetly. This Compression does reducememory in most cases.
How would you handle such type of data? Is there something such as bitarray or something?
Any other way to compress such data in Scala?
Thanks