views:

64

answers:

3

A python program I created is IO bounded. The majority of the time (over 90%) is spent in a single loop which repeats ~10,000 times. In this loop, ~100KB data is generated and written to a temporary file; it is then read back out by another program and statistics about that data collected. This is the only way to pass data into the second program.

Due to this being the main bottleneck, I thought that moving the location of the temporary file from my main HDD to a (~40MB) RAMdisk (inside of over 2GB of free RAM) would greatly increase the IO speed for this file and so reduce the run-time. However, I obtained the following results (each averaged over 20 runs):

  • Test data 1: Without RAMdisk - 72.7s, With RAMdisk - 78.6s
  • Test data 2: Without RAMdisk - 223.0s, With RAMdisk - 235.1s

It would appear that the RAMdisk is slower that my HDD.

What could be causing this?

Are there any other alternative to using a RAMdisk in order to get faster file IO?

+1  A: 

Can you write the data out in batches rather than one item at a time? Are you caching resources like open file handles etc or cleaning those up? Are your disk writes blocking, can you use background threads to saturate IO while not affecting compute performance.

I would look at optimising the disk writes first, and then look at faster disks when that is complete.

Preet Sangha
The point is that he's now using pure RAM - not even a disk at all. By all expectations, he should be getting better performance than even the fastest SSD's on the market would give - yet it's slower than a mechanical drive. He's asking why that might be the case.
Arafangion
Yes I appreciate that. I was suggesting that maybe its not the disk rather the way the disks are used that's the problem.
Preet Sangha
+1  A: 

Your operating system is almost certainly buffering/caching disk writes already. It's not surprising the RAM disk is so close in performance.

Without knowing exactly what you're writing or how, we can only offer general suggestions. Some ideas:

  • If you have 2 GB RAM you probably have a decent processor, so you could write this data to a filesystem that has compression. That would trade I/O operations for CPU time, assuming your data is amenable to that.

  • If you're doing many small writes, combine them to write larger pieces at once. (Can we see the source code?)

  • Are you removing the 100 KB file after use? If you don't need it, then delete it. Otherwise the OS may be forced to flush it to disk.

Ken
+1  A: 

I know that Windows is very aggressive about caching disk data in RAM, and 100K would fit easily. The writes are going directly to cache and then perhaps being written to disk via a non-blocking write, which allows the program to continue. The RAM disk probably wouldn't support non-blocking operations because it expects those operations to be quick and not worth the bother.

By reducing the amount of memory available to programs and caching, you're going to increase the amount of disk I/O for paging even if only slightly.

This is all speculation on my part, since I'm not familiar with the kernel or drivers. I also speculate that Linux would operate similarly.

Mark Ransom