I'm going to build a search engine on solr, and nutch as a crawler. I have to index about 13mln documents. I have 3 servers for this job:
- 4 core Xeon 3Ghz, 20Gb ram, 1.5Tb sata
- 2*4 core Xeon 3Ghz, 16Gb ram, 500Gb ide
- 2*4 core Xeon 3Ghz, 16Gb ram, 500Gb ide
One of the servers I can use as a master for crawling and indexing, other twos as a slave for searching, or I can use one for searching, and another two for indexing with two shards. What architecture can you recommend? Should I use sharding, how much shards, and which of the servers should I use for what?