I need to export information and exported text must be translated. In database data is in non-translated form. My application requirements say that user must be also be able to import exported CSV file to database, so I need to reverse translated text to I18N format which is data format in database. How can I do that or is there any sane way to do this?
I agree with you. In the general case, it seems unsane ! Something like: !!!
- Take all translated Strings as patterns, one by one (use some priority)
- When a pattern match, replace it with the untranslated value for that pattern
- Loop until done !!!
Many problems can be envisionned ...
This relates to the research field of natural language processing. So it is well ... research ! Not really easy to use in everyday programming !
But if you feel interested, googling should locate some algorithms. I believe they are founded on a complicated model (as compared to a regexp !).
I hope you have some other information that guide you... With a bit more of context, it may be a much easier problem...
You need to maintain a dictionary table of translated messages. You probably already have one in some form.
Master message list
| Message key | English text |
| 1 | Payment rejected |
Translations
|Translation | Message key |
|Paiement rejeté | 1 |
|Talu Gwrthodwyd | 1 |
|Maksu hylätty | 1 |
You can use a join to search for the translated text from your data import, and map it back to the untranslated text (or just store the message key).
It might be worth making this more robust by 'reducing' the translated text - strip unneeded whitespace, replace accented characters etc. Do this before storing the translations, and before searching. DB indexes should make the search fast.