We have the following problem while running the git fsck --full --strict
command:
error: sha1 mismatch ced885d12a0677f2db9025e1e684c72e67283fcd
error: ced885d12a0677f2db9025e1e684c72e67283fcd: object corrupt or missing
error: sha1 mismatch cf5a1546bd2de5611eaf6136fb5ca02b4e358bec
error: cf5a1546bd2de5611eaf6136fb5ca02b4e358bec: object corrupt or missing
error: sha1 mismatch cf5d9d5723014921370de479c54a73230c86a981
error: cf5d9d5723014921370de479c54a73230c86a981: object corrupt or missing
error: sha1 mismatch cf675ce5bc5eeb5937441c6a02976cf2fa40076b
error: cf675ce5bc5eeb5937441c6a02976cf2fa40076b: object corrupt or missing
error: sha1 mismatch cf7c5156cf127eb7141505946df51b2b57925a50
error: cf7c5156cf127eb7141505946df51b2b57925a50: object corrupt or missing
dangling commit 3468455f0d9d055bbe957744aa10e670469d3912
dangling commit daeec54632203157a70bae93b9d7c3290820c2f9
(more dangling commit messages)
(Note: I don't really care about the dangling commit messages. I focus on the sha1 mismatch problem.)
My interpretation of this message is that git-fsck recomputes the sha1 from the payload but found a sha1 different from the one used to designate the object. The objects are not missing from the repository (I've check w/ git cat-file).
The weird thing is that if I run the command again, I still have the sha1 messages but for different objects:
error: sha1 mismatch 1452752024456a509540591c4879b3e3534f457e
error: 1452752024456a509540591c4879b3e3534f457e: object corrupt or missing
error: sha1 mismatch 16e08310d7182e97092d2783c911dbcf66538238
error: 16e08310d7182e97092d2783c911dbcf66538238: object corrupt or missing
dangling commit 3468455f0d9d055bbe957744aa10e670469d3912
Note: the repository has not changed between the two runs.
We are running Linux and the current git version is:
$git --version
git version 1.7.2.2.170.g5c7f2
The errors were there in a previous version (1.6.5.rc2.18.g6d8b). Those git were built from the sources using gcc 3.4.4.
HOWEVER, when I copy the repository on another host, git fsck
reports no problem at all. The git version there is 1.7.2.1 (provided by Fedora).
I've made the following observations:
- The objects having invalid sha1 are often in the same range (in the first example, the sha1s begin with ce or cf) and the errors are triggered within a small period during the fsck run. I believe git-fsck does an ordered scan (or maybe objects are sorted within the pack).
- Those objects are relatively big blobs (>900k)
- We've run a 15-minute complete memtest pass for possible hardware memory failure. We haven't found any problem. There is no other strange behavior observed on this server which also perform many other non-git tasks.
git gc
is not complaining
Hypotheses so far:
- This problem is caused by an improper build of git (library version? compiler?)
- Our memtest failed to find a real memory problem.
- There is a subtle bug in git-fsck sha1 calculation that occurs randomly (or more precisely within certain short time windows) for large blobs.
How can we solve this?