views:

2664

answers:

13

An answer (see below) to one of the question right here on SO gave me an idea for a great little piece of software that could be invaluable to coders everywhere.

I'm imagining a RAMDrive software, but with one cruicial difference - it would mirror a real folder on my hard drive. More specifically - the folder which contains the project I'm currently working on. This way any builds would be nearly instantaneous (or at least a couple orders of magnitude faster). The RAMDrive would synchronize its contents with the HDD in background using only idle resources.

A quick Google search revealed nothing, but perhaps I just don't know how to google. Perhaps someone knows of such a software? Preferably free, but reasonable fees might be OK too.

Added: Some solutions have been suggested which I discarded in the very beginning. They would be (in no particular order):

  • Buy a faster HDD (SSD maybe or 10K RPM). I don't want a hardware solution. Not only software has the potential to be cheaper (freeware, anyone?) but it can also be used in envoronments where hardware modifications would be unwelcome if not impossible - say in at the office.
  • Let OS/HDD do the caching - it knows better how to use your free RAM. The OS/HDD have generic cache algorithms thar cache everything and try to predict which data will be most needed in the future. They have no idea that for me the priority is my project folder. And as we all know quite well - they don't really cache it much anyway. ;)
  • There are plenty RAMDrives around, use one of those. Sorry, that would be reckless. I need my data to be synchronized back to the HDD whenever there is a bit of free time. In the case of a power failure I could bare loosing the last 5 minutes of work, but not everything since my last checkin.

Added 2: An idea that came up - use normal RAMDrive plus a background folder synchronizer (but I do mean background). Is there any such thing?

Added 3: Interesting. I just tried out a simple RAMDrive at work. The rebuild time drops from ~14s to ~7s (not bad), but incremental build is still at ~5s - just like on the HDD. Any ideas why? It uses aspnet_compiler and aspnet_merge. Perhaps they do something with other temp files elsewhere?

Added 4: Oh, nice new set of answers! :) OK, I've got a bit more info for all you naysayers. :)

One of the main reasons for this idea is not the above mentioned software (14s build time) but another one that I didn't have access at the time. This other application has 100MB code base, and it's full build takes about 5 minutes. Ah yes, it's in Delphi 5, so compiler isn't too advanced. :) Putting the source on a RAMDrive resulted in a BIG difference. Got it below a minute, I think. Haven't measured. So for all those who say that OS can cache stuff better - I'd beg to differ.

Related Question:

RAM disk for speed up IDE

Note on first link: The question to which it links has been deleted because it was a duplicate. It asked:

What do you do while your code’s compiling?

And the answer by Dmitri Nesteruk to which I linked was:

I compile almost instantly. Partly due to my projects being small, partly due to the use of ramdisks.

A: 

This sounds like disk caching which your operating system and / or your hard drive will handle for you automatically (to varying degrees of performance, admittedly).

My advice is, if you don't like the speed of your drive, buy a high speed drive purely for compiling purposes. Less labor on your part and you might have the solution to your compiling woes.

Bob Cross
Unfortunately OS/HDD don't let me tell them "cache this folder at all costs". :)And I was looking for a software solution instead of hardware because my office PC has enough RAM but I'm not sure my employer would be happy if I started messing with the hardware. Plus, it could be free. ;)
Vilx-
I do agree. A RAM disk is an order of magnitude faster than a hard disk for certain purposes, and I'm tired of getting the same "no need for a RAM disk" answer each time I ask for one :-)
Stephan Leclercq
Er... I meant a CACHED hard disk, of course :-)
Stephan Leclercq
Okay, let's put it another way: this sounds like an optimization problem. Exactly what speed problem are you trying to solve? Compile time to actual disk or compile time to runnable binary?
Bob Cross
Even a 10k rpm raptor/velociraptor drive didn't bring *that* great compile-time reductions over a decent 7200rpm drive for me.
snemarch
+2  A: 

we used to do this years ago for a 4GL macro-compiler; if you put the macro library and support libraries and your code on a RAMdisk, compiling an application (on an 80286) would go from 20 minutes to 30 seconds

Steven A. Lowe
That's all nice and fine - there are plenty classical RAMDrives around. But I don't want my code to live SOLELY on the RamDrive. That's a bit dangerous you know. ;) I would like it to be synchronized to HDD whenever there is a bit of free time to do so.
Vilx-
@Steven A. Lowe - set in Scheduled Tasks to run every minute? Now THAT would kill my PC's performance. :P Although I suppose I could make it copy only newer files. Still it's pretty much work.However you did give me an idea - maybe some background folder synchronizator?
Vilx-
@Vlix: in our scenario, it took about 2 seconds to copy the files to the RAM disk before compiling, and 2 seconds to copy them out after compiling. I think some RAM disk products offer a shadow/synch feature
Steven A. Lowe
+6  A: 

Your OS will cache things in memory as it works. A RAM disk might seem faster, but that's because you aren't factoring in the "copy to RAMDisk" and "copy from RAMDisk" times. Dedicating RAM to a fixed size ramdisk just reduces the memory available for caching. The OS knows better what needs to be in RAM.

James Curran
This is true if you're copying the stuff to and from RAM every compile, but the idea is that you're copying to ram once, and compiling several times after making minor changes. I've done this before for related processes (not compiling) and for huge data sets is makes a really big difference.
Adam Davis
Don't trust the caching strategy of the OS to give you the same performance as your own caching strategy.
Adam Davis
Shouldn't the contents of recently accessed or modified files, i.e., those touched by the compiler, be in the file system cache?
Jay Conrod
read-caching is only part of the problem, compiling produces output files - possibly with suboptimal write patterns, and often a bunch of temporary files.
snemarch
+3  A: 

I dont have exactly what you're looking for but I'm now using a combination of this these two Ramdisk and DRAM ramdisk. Since this is Windows I have a hard 3GB limit for core memory, meaning I cannot use too much memory for a ramdisk. 4GB extra on the 9010 really rocks it. I let my IDE store all its temporary stuff on the solid state ramdisk and also the maven repo. The DRAM ramdisk has battery backup to flash card.This sounds like an advertisement, but it really is an excellent setup.

The DRAM disk has double SATA-300 ports and comes out with 0.0ms average seek on most tests ;) Something for the christmas stocking ?

krosenvold
Sweet, but I don't have that much spare money. Plus, as I said - I want it to be software.
Vilx-
This sounds really cool (though not meeting the posters criteria). Does the backup to flash happen automatically whenever the drive is powered down, or what?
erickson
Yep. The battery is enough to persist dram to the flash card. And it happens when mains power is lost.
krosenvold
The sad thing about these solutions is that the several GB/s of dram is limited to the SATA-300 interface speeds :(
snemarch
At least it's dual SATA-300 interface ;) Strangely enough, I don't feel sad about it....
krosenvold
+9  A: 

In linux (you never mentioned which OS you're on so this could be relevant) you can create block devices from RAM and mount them like any other block device (i.e. a HDD).

You can then create scripts that copy to and from that drive on start-up / shutdown, as well as periodically.

For example, you could set it up so you had ~/code and ~/code-real. Your RAM block gets mounted at ~/code on startup, and then everything from ~/code-real (which is on your standard hard drive) gets copied over. On shutdown everything would be copied (rsync'd would be faster) back from ~/code to ~/code-real. You would also probably want that script to run periodically so you didn't lose much work in the event of a power failure etc.

I don't do this anymore (used it for Opera when the 9.5 beta was slow, no need anymore) but I may still have the scripts. Will look when I get home.

Here is how to create a RAM Disk in linux.

SCdF
+1 - Put the script in cron to rsync every five minutes or so, and nice it so it doesn't kill performance. I don't know how this would be accomplished on Windows, but there is a task scheduler, and there are rsync tools, so those plus a RAM disk app should work for you if that's your platform.
Adam Jaskiewicz
Yap, I use Windows. Vista Business 32-bit, to be precise. :) But other OS are relevant to the topic too. And, yes - the idea to use simple ramdisk+folder sync utility (xcopy in the most simple case) has already crossed my mind. See above.
Vilx-
A: 
  1. Profile. Make sure you do good measurements of each options. You can even buy things you've already rejected, measure them, and return them, so you know you're working from good data.

  2. Get a lot of RAM. 2GB DIMMs are very cheap; 4GB DIMMs are a little over $100/ea, but that's still not a lot of money compared to what computer parts cost just a few years ago. Whether you end up with a RAM disk or just letting the OS do its thing, this will help. If you're running 32-bit Windows, you'll need to switch to 64-bit to make use of anything over 3GB or so.

  3. Live Mesh can synchronize from your local RAM drive to the cloud or to another computer, giving you an up-to-date backup.

  4. Move just compiler outputs. Keep your source code on the real physical disk, but direct .obj, .dll, and .exe files to be created on the RAM drive.

  5. Consider a DVCS. Clone from the real drive to a new repository on the RAM drive. "push" your changes back to the parent often, say every time all your tests pass.

Jay Bazuzi
2. Not true. If your hardware supports PAE the OS can address more than 4G, just each application's address space is limited to 4G.
Adam Hawes
@Adam> only for server editions of Windows - client editions limit you to 4GB of address space (yes, AS, not physical memory! mix of market segmentation and "3rd parties wrote buggy drivers" excuse).
snemarch
+1  A: 

I wonder if you could build something like a software RAID 1 where you have a physical disk/partition as a member, and a chunk of RAM as a member.

I bet with a bit of tweaking and some really weird configuration one could get Linux to do this. I am not convinced that it would be worth the effort though.

Zoredache
+2  A: 

http://en.gentoo-wiki.com/wiki/Speeding_up_emerge_with_tmpfs

Speeding up compiles using ram drives under Gentoo was the subject of an howto written many eons ago. It provides a concrete example of what has been done. The gist is that all source and build intermediate file are redirected to a ram disk for compile, while final binaries are directed to the hard drive for install.

Also, I recommend exploring maintaining your source on hard drive, but git push your latest source changes to a clone respository that resides on the ram disk. Compile the clone. Use your favorite script to copy the binaries created.

Hope that helps.

composer
A: 

Just as James Curran says, the fact that most programs follow the law of locality of references, the frequent code and data page count will be narrowed over time to a manageable size by the OS disk cache. RAM Disks were useful when operating systems were built with limitations such as stupid caches (Win 3.x, Win 95, DOS). The RAM Disk advantage is near zero and if you assign a lot of RAM it will suck memory available to the system cache manager hurting overall system performance. The rule of thumb is: let your kernel to do that. This is the same as the "memory defragmentation" or "optimizers" programs: they actually force pages out of cache (so you get more RAM eventually) but causing the system to do a lot of page-faulting over time when your loaded programs begin to ask for code/data that was paged out.

So for more performance, get a fast disk I/O hardware subsystem, may be RAID, faster CPU, better chipset (no VIA!), more physical RAM, etc.

Hernán
RAM disks can still offer substantial advantages. 1) you're 100% guaranteed to be in ram, OS cache doesn't guarantee this. 2) FS metadatase journalling... 3) overzealous filesync (firefox profile on ramdrive can go a *lot* faster than on a regular disk - remember backups though :) ).
snemarch
1) For most applications, data not being 100% at RAM is a good thing, I think, due to (again) reference locality 2) This could be true, but for good FS journalling algorithms? [examples?] 3) I don't know, I'm sure you are using it to assert that :)
Hernán
+8  A: 

I'm surprised at how many people suggest that the OS can do a better job at figuring out your caching needs than you can in this specialized case. While I didn't do this for compiling, I did do it for similar processes and I ended up using a ramdisk with scripts that automated the synchronization.

In this case, I think I'd go with a modern source control system. At every compile it would check in the source code (along an experimental branch if needed) automatically so that every compile would result in the data being saved off.

To start development, start the ramdisk and pull the current base line. Do the editing, compile, edit, compile, etc - all the while the edits are being saved for you.

Do the final check in when happy, and you don't even have to involve your regular HD.

But there are background synchronizers that will automate things - the issue is that they won't be optimized for programming either, and may need to do full directory and file scans occasionally to catch changes. A source code control system is design for exactly this purpose, so it would likely be lower overhead even though it exists in your build setup.

Keep in mind that a background sync task, in the case of a power outage, is undefined. You would end up having to figure out what was saved and what wasn't saved if things went wrong. With a defined save point (at each compile, or forced by hand) you'd have a pretty good idea that it was at least in a state where you thought you could compile it. Use a VCS and you can easily compare it to the previous code and see what changes you've applied already.

Adam Davis
Keeping the source files on ramdisk is dangerous - better to keep them on disk and have automated sync'ing to the ramdisk before builds. +1 for OS not always handling specialized needs perfectly, though!
snemarch
The source files are used and edited on ramdisk, but I'm suggesting that at every compile they are saved off to regular disk or repository. I'm not suggesting using a ramdisk alone with no form of non-volatile saving.
Adam Davis
+1  A: 

What can be super beneficial on even a single core machine is parallel make. Disk IO is a pretty large factor in the build process. Spawning two compiler instances per CPU core can actually increase performance. As one compiler instance blocks on IO the other one can usually jump into the CPU intensive part of compiling.

You need to make sure you've got the RAM to support this (shouldn't be a problem on a modern workstation) otherwise you'll end up swapping and that defeats the purpose.

On Gnu make you can just use -j[n] where [n] is the number of simultaneous processes to spawn. Make sure you have your dependency tree right before trying it though or the results can be unpredictable.

Another tool that's really useful (in the parallel make fashion) is distcc. It works a treat with Gcc (if you can use Gcc or something with a similar command line interface). DistCC actually breaks up the compile task by pretending to be the compiler and spawning tasks on remote servers. You call it in the same way as you'd call GCC, and you take advantage of make's -j[n] option to call many distcc processes.

At one of my previous jobs we had a fairly intensive Linux operating system build that was performed almost daily for a while. Adding in a couple of dedicated build machines and putting distcc on a few workstations to accept compile jobs allowed us to bring build times down from a half a day to under 60 minutes for a complete OS + userspace build.

There's a lot of other tools to speed compiles existing. You might want to investigate more than creating RAM disks; something which looks like it will have very little gain since the OS is doing disk caching with RAM. OS designers spend a lot of time getting caching right for most workloads; they are (collectively) smarter than you so I wouldn't like to try and do better than them.

If you chew up RAM for RAM disk the OS has less working RAM to cache data and to run your code -> you'll end up with more swapping and worse disk performance than otherwise (not: you should profile this option before completely discarding it).

Adam Hawes
A: 

Some ideas off the top of my head:

Use Sysinternals' Process Monitor (not process explorer) to check what goes on during a build - this will let you see if %temp% is used, for instance (keep in mind that response files are probably created with FILE_ATTRIBUTE_TEMPORARY which should prevent disk writes if possible, though). I've moved my %TEMP% to a ramdisk, and that gives me minor speedups in general.

Get a ramdisk that supports automatically loading/saving disk images, so you don't have to use boot scripts to do this. Sequential read/write of a single disk image is faster than syncing a lot of small files.

Place your often-used/large header files on the ramdisk, and override your compiler standard paths to use the ramdrive copies. Will likely not give that much of an improvement after first-time builds, though, as the OS caches the standard headers.

Keep your source files on your harddrive, and sync to the ramdisk - not the other way around. Check out MirrorFolder for doing realtime synchronization between folders - it achieves this via a filter driver, so only synchronizes what is necessary (and only does changes - 4kb write to 2gb file will only cause 4kb write to the target folder). Figure out how to make your IDE build from the ramdrive although the source files are on your harddisk... and keep in mind that you'll need a large ramdrive for large projects.

snemarch
+1  A: 

There are plenty RAMDrives around, use one of those. Sorry, that would be reckless.

Only if you work entirely in the RAM disc, which is silly..

Psuedo-ish shell script, ramMake:

# setup locations
$ramdrive = /Volumes/ramspace
$project = $HOME/code/someproject

# ..create ram drive..

# sync project directory to RAM drive
rsync -av $project $ramdrive

# build
cd $ramdrive
make

#optional, copy the built data to the project directory:
rsync $ramdrive/build $project/build

That said, your compiler can possibly do this with no additional scripts.. Just change your build output location to a RAM disc, for example in Xcode, it's under Preferences, Building, "Place Build Products in:" and "Place Intermediate Build Files in:".

dbr