views:

1426

answers:

4

I've seen this asked a lot around on the internet, with no satisfactory answers or root cause. Sometimes it's just 'well, replace your RAID controller', or 're-run grub install', etc. I'm running into this issue on a substantially large amount of hosts as to convince me there's some non-hardware reason it's happening - but it's also nondeterministic (not 100% of hosts fail here, although they're built the same way).

I'd like some more information about how GRUB loads stage2, i.e. more specifically what it's looking for and where it gets the information about where stage2 is.

Thanks!

A: 

Maybe the grub manual will help you http://www.gnu.org/software/grub/manual/html_node/Stage2-errors.html

rogeriopvl
+1  A: 

Thanks for your responses.

@rogeriopvl - Unfortunately, while the stage 2 errors are very descriptive and useful, I'm not making it to stage 2.

@David - Aha! I had pored through the manual, but missed the "technical info" section. Here's the relevant quote:

Stage 1 "knows" where Stage 2 is by entries in a block-list loading table embedded in it. It loads the lists of blocks off of the booting drive, then jumps to a specified CS:IP in 16-bit real mode. These are described in the page on embedded data. It queries the BIOS for the disk geometry, and maps the linear block numbers there to C:H:S addresses used by the INT 13h BIOS interface.

So, what I can conclude from this is that it doesn't matter that my /boot partition is on a software raid (/dev/md0). In theory the GRUB install figured out which actual disk blocks (across multiple disks) contained stage two and put that in the stage1 area.

That still doesn't explain the nondeterministic failures though.

+1  A: 

To debug it, modify the source code for stage 1 (or 1.5 if that's where it's failing) to display the C, H, and S where it's looking for stage 2. I don't think you can fit much more into stage 1 but I think you can fit this much in.

Then boot Knoppix or some other bootable system, and look at the contents of those disk sectors to see if they're what you expect. If some PCs have RAID driver stuff there and some PCs have grub stuff there, that will give you a lead on where the problem is.

If different PCs say they're looking for stage 2 in different locations, that will give you a lead too.

(I hope it's acceptable to suggest modifying source code in a question that's going to be closed as not programming related.)

Windows programmer
+1  A: 

I needed to modify grub slightly years ago, so I wrote some notes on how it progresses: http://www.pixelbeat.org/docs/disk/

pixelbeat