As has been alluded to in the original question, one of the biggest problems when setting out to do iterative hardware design is going to be the cost of the spins.
Obviously the costs are going to depend on the nature of the hardware (eg. board size, board complexity, component cost, loading costs) - but if you have a complex board the costs of doing lots of iterations are going to be prohibitive.
And while it might be possible to leave the fine detail of the design of particular portions of your hardware to later iterations it's likely that you will have to accommodate those portions right from the start (eg. allocation of I/O, allocation of pcb real estate), which again is going to make it difficult to iteratively do hardware.
To some extent it depends on what you mean by hardware. If your hardware team is responsible for developing all of the drivers and the hardware test code then that's a different story. There's no reason why they can't make features available to the software team in an iterative fashion, also meaning that they should (must?) be a member of the project team.
Depending, again, on the type of hardware that you're developing and the interfaces that the software will use to talk to it, it will potentially be useful to design a partial-feature board. This board will be of the most benefit if it exposes all of the interfaces in a way that allows for debugging and verification - so that the low level routines can be at least partially tested before the real hardware comes along.
But this is only going to be worthwhile if the time it takes to do the real hardware is prohibitive and the time to do a development version is much lower. Otherwise just go for the real hardware and save the cost.
Continuous integration is a great ideal in principle, but the practicalities are going to be determined by the type of hardware that you're developing.
If your hardware lends itself to automated tests then this is a good way to go. See this question for some pointers on how to achieve this.
(btw. programmable logic, such as fpgas, should always be continuously integrated)