I am a newbie working in a simple Rails app that translates a document (long string) from a language to another. The dictionary is a table of terms (a string regexp to find and substitute, and a block that ouputs a substituting string). The table is 1 million records long.
Each request is a document that wants to be translated. In a first brutish force approach I need to run the whole dictionary against each request/document.
Since the dictionary will run whole every time (from the first record to the last), instead of loading the table of records of the dictionary with each document, I think the best would be to have the whole dictionary as an array in memory.
I know it is not the most efficient, but the dictionary has to run whole at this point.
1.- If no efficiency can be gained by restructuring the document and dictionary (meaning it is not possible to create smaller subsets of the dictionary). What is the best design approach?
2.- Do you know of similar projects that I can learn from?
3.- Where should I look to learn how to load such a big table into memory (cache?) at rails startup?
Any answer to any of the posed questions will be greatly appreciated. Thank you very much!