The software is a classic search engine. There is one portion of the app that is tasked with crawling/collecting data, and there is another that takes that data and builds an index or database. The final portion handles queries from clients, and performs a search on the data, before retrieving the results.
The specific engine that I'm discussing is one where the data is frequently updated (at least once per minute) so the queries must always be operating on the latest data.
My question is simple. Should these three tasks be handled by three separate processes, or a single process with multiple threads dedicated to each?
The main reason for my question is regarding the best way to partition memory. If the searcher has to update the available data for the indexer, and the indexer has to update the datasets for the query handler, would it make sense for them all to live under the same process and have the same address space? Or would it be acceptable to have separate processes that use shared memory mapped files?
I am leaning towards separate processes so that each can live on a different machine, enabling clustering, distribution, etc. But in terms of raw speed for smaller datasets, would a consolidated approach be preferred?
The OS is Windows, the language is C++.