Honestly I would rethink your approach and I'll tell you why.
I've done a lot of work on distributed high-volume systems (financial transactions specifically) and your solution--if the volume is sufficiently high (and I'll assume it is or you wouldn't be contemplating a clustered solution; you can get an awful lot of power out of one off-the-shelf box these days)--then you will kill yourself with remote calls (ie calls for data from another node).
I will speak about Tangosol/Oracle Coherence here because it's what I've got the most experience with, although Terracotta will support some or most of these features and is free.
In Coherence terms what you have is a partitioned cache where if you have n nodes, each node possesses 1/n of the total data. Typically you have redundancy of at least one level and that redundancy is spread as evenly as possible so each of the other n-1 nodes possesses 1/n-1 of the backup nodes.
The idea in such a solution is to try and make sure as many of the cache hits as possible are local (to the same cluster node). Also with partitioned caches in particular, writes are relatively espensive (and get more expensive with the more backup nodes you have for each cache entry)--although write-behind caching can minimize this--and reads are fairly cheap (which is what you want out of your requirements).
So your solution is going to ensure that every cache hit will be to a remote node.
Also consider that generating content is undoubtedly much more expensive than serving it, which I'll assume is why you came up with this idea because then you can have more content generators than servers. It's the more tiered approach and one I'd characterize as horizontal slicing.
You will achieve much better scalability if you can vertically slice your application. By that I mean that each node is responsible for storing, generating and serving a subset of all the content. This effectively eliminates internode communication (excluding backups) and allows you to adjust the solution by simply giving each node a different sized subset of the content.
Ideally, whatever scheme you choose for partitioning your data should be reproducible by your Web server so it knows exactly which node to hit for the relevant data.
Now you might have other reasons for doing it the way you're proposing but I can only answer this in the context of available information.
I'll also point you to a summary of grid/cluster technologies for Java I wrote in response to another question.