Hello,
I'd like to design a JVM data structure (Java/Scala) that can be used to represent and store the contents of arbitrary relational database tables. The data structure should be fast (not too gc-intensive, cache-friendly) and memory efficient, so larger tables can fit in RAM.
One memory-efficient solution is to store each column separately in a primitive array, but I'm worried about the cache friendliness because items in the same row are not stored together. A row with N columns will incur N cache misses, no matter how narrow the columns.
Another solution is to store each row in an object array where each element represents a field and is cast to the correct type on retrieval, but this requires storing numeric types in their boxed form, so it's not very memory-efficient. And it's probably not that cache efficient either.
Another solution is to layout each row's data into a byte array the same way real databases serialize their rows, using only as many bytes as necessary. This is cache-friendly and memory efficient, but I'm concerned about the cost of serialization/de-serialization on every access.
What's the best way?