Note: This is a follow up to this question.
I have a "legacy" program which does hundreds of string matches against big chunks of HTML. For example if the HTML matches 1 of 20+ strings, do something. If it matches 1 of 4 other strings, do something else. There are 50-100 groups of these strings to match against these chunks of HTML (usually whole pages).
I'm taking a whack at refactoring this mess of code and trying to come up with a good approach to do all these matches.
The performance requirements of this code are rather strict. It needs to not wait on I/O when doing these matches so they need to be in memory. Also there can be 100+ copies of this process running at the same time so large I/O on startup could cause slow I/O for other copies.
With these requirements in mind it would be most efficient if only one copy of these strings are stored in RAM (see my previous question linked above).
This program currently runs on Windows with Microsoft compiler but I'd like to keep the solution as cross-platform as possible so I don't think I want to use PE resource files or something.
Mmapping an external file might work but then I have the issue of keeping program version and data version in sync, one does not normally change without the other. Also this requires some file "format" which adds a layer of complexity I'd rather not have.
So after all of this pre-amble it seems like the best solution is to have a bunch arrays of strings which I can then iterate over. This seems kind of messy as I'm mixing code and data heavily, but with the above requirements is there any better way to handle this sort of situation?