After reading this old article measuring the memory consumption of several object types, I was amazed to see how much memory Strings
use in Java:
length: 0, {class java.lang.String} size = 40 bytes
length: 7, {class java.lang.String} size = 56 bytes
While the article has some tips to minimize this, I did not find them entirely satisfying. It seems to be wasteful to use char[]
for storing the data. The obvious improvement for most western languages would be to use byte[]
and an encoding like UTF-8 instead, as you only need a single byte to store the most frequent characters then instead of two bytes.
Of course one could use String.getBytes("UTF-8")
and new String(bytes, "UTF-8")
. Even the overhead of the String instance itself would be gone. But then there you lose very handy methods like equals()
, hashCode()
, length()
, ...
Sun has a patent on byte[]
representation of Strings, as far as I can tell.
Frameworks for efficient representation of string objects in Java programming environments
... The techniques can be implemented to create Java string objects as arrays of one-byte characters when it is appropriate ...
But I failed to find an API for that patent.
Why do I care?
In most cases I don't. But I worked on applications with huge caches, containing lots of Strings, which would have benefitted from using the memory more efficiently.
Does anybody know of such an API? Or is there another way to keep your memory footprint for Strings small, even at the cost of CPU performance or uglier API?
Please don't repeat the suggestions from the above article:
- own variant of
String.intern()
(possibly withSoftReferences
) - storing a single
char[]
and exploiting the currentString.subString(.)
implementation to avoid data copying (nasty ;)
Update
I ran the code from the article on Sun's current JVM (1.6.0_10). It yielded the same results as in 2002.