We have a a large set of objects that include composition and name properties, both string values that contain values with a lot of duplication, what would be a suitable data structure to store the strings which can be searchable and small?
The data includes many chemical and product names that are duplicates or differ only slightly. I'd like to be able to store the string content of the objects in a compressed format that can also be searched.
I've experimented with Tries to make a fast searchable index over the names but this is currently in addition to the storage of each objects string data.
This data is static and distributed as a separate binary file with the application.