tags:

views:

56

answers:

3

I'm using a 3rd party cross-platform project builder which utilizes various compilers. This project builder always rebuilds the project fully and I'm trying to implement a "smart-rebuild" machanism.

I thought of running the preprocessor on each .cpp, crc the result and compare it against the CRC of the previous rebuild. If they differ, I'll mark the .cpp for compilation. If not, I'll use the previous object file.

Is this method secure enough? Is it probable that I'll get the same CRC when the code/headers are modified? Is there a certain CRC algorithm that can make it safer?

A: 

Probably CRC-32 is enough, but to be extra safe you could use SHA-1 or another even longer version of that family.

Edit after rereading the question: it's at least theoretically possible that a modified source code could result in the same CRC. With SHA that possibility is already much smaller (and eg, once the file length changes, the SHA changes anyways). There are articles that handle the specifics of this in painstaking depth if you need such background.

fvu
+1  A: 

The obvious check to do first (because they don't hit disk) is to check if the .cpp still has the same size. If not, there's no point in wasting time calculating a CRC. Another check is for the file date. This is not perfect, but again a difference is probably enough to warrant a rebuild.

CRC is basically the correct algorithm. It catches reorderings, too, but isn't cryptographically strong.

MSalters
I wouldn't rely on the size, because a+b and a-b have the same size, but different CRCs...
AshleysBrain
Doesn't matter: The essence is that you should rebuild if the size changes, and you can skip the time calculating the CRC. In pseudo-code: `if (file1.size != file2.size || crc(file1) != crc(file2)) { build(); }` , with the usual short-circuit logic.
MSalters
A: 

The GNU "cons" build system generates an MD5 signature of component files and build commands for each target and can then decide whether or not to rebuild the target. We used it at a former employer, and coupled with a build cache it made building large source trees much faster, especially when each developer was working on his or her own small part -- once all the code had been built once by someone, no one else had to rebuild it.

On the other hand, I found the Perl syntax of the build control files overly complex and confusing.

Berry