views:

36

answers:

0

We have the following problem while running the git fsck --full --strict command:

error: sha1 mismatch ced885d12a0677f2db9025e1e684c72e67283fcd

error: ced885d12a0677f2db9025e1e684c72e67283fcd: object corrupt or missing
error: sha1 mismatch cf5a1546bd2de5611eaf6136fb5ca02b4e358bec

error: cf5a1546bd2de5611eaf6136fb5ca02b4e358bec: object corrupt or missing
error: sha1 mismatch cf5d9d5723014921370de479c54a73230c86a981

error: cf5d9d5723014921370de479c54a73230c86a981: object corrupt or missing
error: sha1 mismatch cf675ce5bc5eeb5937441c6a02976cf2fa40076b

error: cf675ce5bc5eeb5937441c6a02976cf2fa40076b: object corrupt or missing
error: sha1 mismatch cf7c5156cf127eb7141505946df51b2b57925a50

error: cf7c5156cf127eb7141505946df51b2b57925a50: object corrupt or missing
dangling commit 3468455f0d9d055bbe957744aa10e670469d3912
dangling commit daeec54632203157a70bae93b9d7c3290820c2f9
(more dangling commit messages)

(Note: I don't really care about the dangling commit messages. I focus on the sha1 mismatch problem.)

My interpretation of this message is that git-fsck recomputes the sha1 from the payload but found a sha1 different from the one used to designate the object. The objects are not missing from the repository (I've check w/ git cat-file).

The weird thing is that if I run the command again, I still have the sha1 messages but for different objects:

error: sha1 mismatch 1452752024456a509540591c4879b3e3534f457e

error: 1452752024456a509540591c4879b3e3534f457e: object corrupt or missing
error: sha1 mismatch 16e08310d7182e97092d2783c911dbcf66538238

error: 16e08310d7182e97092d2783c911dbcf66538238: object corrupt or missing
dangling commit 3468455f0d9d055bbe957744aa10e670469d3912

Note: the repository has not changed between the two runs.

We are running Linux and the current git version is:

$git --version
git version 1.7.2.2.170.g5c7f2

The errors were there in a previous version (1.6.5.rc2.18.g6d8b). Those git were built from the sources using gcc 3.4.4.

HOWEVER, when I copy the repository on another host, git fsck reports no problem at all. The git version there is 1.7.2.1 (provided by Fedora).

I've made the following observations:

  1. The objects having invalid sha1 are often in the same range (in the first example, the sha1s begin with ce or cf) and the errors are triggered within a small period during the fsck run. I believe git-fsck does an ordered scan (or maybe objects are sorted within the pack).
  2. Those objects are relatively big blobs (>900k)
  3. We've run a 15-minute complete memtest pass for possible hardware memory failure. We haven't found any problem. There is no other strange behavior observed on this server which also perform many other non-git tasks.
  4. git gc is not complaining

Hypotheses so far:

  1. This problem is caused by an improper build of git (library version? compiler?)
  2. Our memtest failed to find a real memory problem.
  3. There is a subtle bug in git-fsck sha1 calculation that occurs randomly (or more precisely within certain short time windows) for large blobs.

How can we solve this?