Algorithms for Optimization with Fast Disk Storage (SSDs)?

Your example of hashtables is indeed the key database structure that will benefit. Instead of having to load a whole 4GB or more file into memory to probe for values, the SSD can be probed directly. The SSD is still slower than RAM, by orders of magnitude, but it's quite reasonable to have a 50GB hash table on disk, but not in RAM unless you pay big money for big iron.

An example is chess position databases. I have over 50GB of hashed positions. There is complex code to try to group related positions near each other in the hash, so I can page in 10MB of the table at a time and hope to reuse some of it for multiple similar position queries. There's a ton of code and complexity to make this efficient.

Replaced with an SSD, I was able to drop all the complexity of the clustering and just use really dumb randomized hashes. I also got an increase in performance since I only fetch the data I need from the disk, not big 10MB chunks. The latency is indeed larger, but the net speedup is significant.. and the super-clean code (20 lines, not 800+), is perhaps even nicer.

SSD's are only significantly faster for random access. Sequential access to disk they are only twice as performant as mainstream rotational drives. Many SSD's have poorer performance in many scenarios causing them to perform worse, as described here.

While SSD's do move the needle considerably, they are still much slower than CPU operations and physical memory. For your 4GB hash table example, you may be able to sustain 250+ MB/s off of an SSD for accessing random hash table buckets. For a rotational drive, you'd be lucky to break the single digit MB/s. If you can keep this 4 GB hash table in memory, you could access it on the order of gigabytes a second - much faster than even a very swift SSD.

The referenced article lists several changes MS made for Windows 7 when running on SSD's, which can give you an idea of the sort of changes you could consider making. First, SuperFetch for prefetching data off of disk is disabled - it's designed to get around slow random access times for disk which are alleviated by SSD's. Defrag is disabled, because having files scattered across the disk aren't a performance hit for SSD's.

The point is, not all other things are equal. Specifically as an example, 4GB of SSD space is relatively easy to find; 4GB of system memory easily addressable is much harder to find.

McWafflestix 2009-06-16 21:58:07

4GB of RAM is pretty standard on any computer that needs to sort 4GB worth of stuff.

Triptych 2009-06-16 22:01:11

Price per gigabyte of memory is still lower for RAM versus SSD. 64-bit address space is common in servers and becoming more common on the desktop.

Michael 2009-06-16 22:01:50

@Triptych: yes, 4GB of RAM is pretty standard, when you fill that 4GB up with the hashtable, where is your OS going to sit? Your application?

McWafflestix 2009-06-16 22:03:58

@Michael: yes, it's a good point, but typically RAM is at a premium; persistent storage less so.

McWafflestix 2009-06-16 22:05:57

@McWafflestix - I messed up my statement, RAM is still more expensive per gigabyte. But 4 GB of RAM is cheaper than a decent performing SSD.

Michael 2009-06-16 22:14:03

@Michael: yes, it's a good point; still, consider the situation where a user application would like to use a large precomputed hashtable. At installation time, you can probably expect to have 4GB of storage space, and sometime in the near future you can possibly expect that to be solid state; I don't know how viable it would be to expect the user to have 4GB of EXTRA RAM to put your hashtable in. I agree, though, that for server type uses, it's usually easier to add more RAM; consider, however, the point Arno Setagaya makes above of a 50GB hashtable.

McWafflestix 2009-06-16 22:29:03

@McWafflestix - It makes me feel like we're trying to force situations where SSD's would be beneficial by creating scenarios that are pathological for rotational media. A 50GB hash table that must have better than 50 MB/s read bandwidth would benefit from an SSD - but just because we picked the worse datastore imaginable for a mechanical drive. If the data could be reorganized into a B-tree like layout, or we separate the indexes and keep them in memory, caching large chunks of the table in memory, and so on, we could still achieve decent performance.

Michael 2009-06-16 22:50:12

@Michael: it's a good point; I don't particularly want to get into too much fiddling over details; the point is that the hashtable idea is just one idea; I wanted to know if anyone had any others.

McWafflestix 2009-06-16 23:05:39

You're talking more about optimizations for SSDs; I'm more considering types of algorithms that are made possible (or more viable) by the SSD performance. I'm less interested in the optimizations that are possible (or necessary) than I am in the different types of algorithms or applications which were simply not possible with slower persistent storage.

McWafflestix 2009-06-16 22:25:20

Actually, kind of the point was that algorithms themselves don't use disks; the point was, what standard algorithms can be enabled by using the performance increases of SSDs? Much like how managed code was enabled by computers of a certain speed and size...

McWafflestix 2009-06-16 23:04:30

Algorithms themselves **don't** use disks - implementations of algorithms do - upon that we can agree. Yes, managed code was made possible by hardware improvements - but it ook many orders of magnitude "better" computer hardware to do so. The jump between HDD's and SSD's isn't (pardon the expression) oodles of magnitudes. The only reliable benefit is random access. Going back to my initial response "...which requires lots of random disk I/O..."

Chris 2009-06-17 00:10:28

Note that in the original question, I mentioned that the hashtable is more viable for lookup than population for that precise reason; consider the concept of a "pre-populated" hashtable that ships with software to allow for predefinition of a hash lookup; 4GB of installation space is quite reasonable for modern apps.

McWafflestix 2009-06-18 16:39:37

ansaurus

tags:

views:

answers:

Algorithms for Optimization with Fast Disk Storage (SSDs)?

related questions