For starters, you should probably do some sort of normalization. You should probably convert all of your text to a single encoding (eg: UTF-8). You may also want to do case-folding, other Unicode normalizations and perhaps also sorting each set (depending on how you're storing them).
It's unclear (to me) from your question whether you want to find exact matches or just string sets that are "similar". If you only care about exact matches once the normalization is taken into account, then you're pretty much done. Just have an index on the normalized forms of your string sets and you can look up new sets quickly by normalizing them as well.
If you want to find near matches then you'll probably want to do some sort of similarity hashing. The Wikipedia article on Locality Sensitive Hashing describes a number of techniques.
The basic idea behind a number of these techniques is to compute a handful of very lossy hashes on each string, h[0] through h[n]. To look up a new string set you'd compute its hashes and look each of these up. Anything that gets at least one match is "similar", and the more matches the more similar it is (and you can choose what threshhold to cut things off at).