views:

77

answers:

1

Problem statement: You are given a set of k strings, each length n. You have to output the group of anagrams together. Anagrams are like e.g atm - mat , like-kile.

+3  A: 

Just sort the word's letters to obtain a signature that's anagram-specific. E.g., in Python,

sig = ''.join(sorted(word))

and make a dict with sig as the key and the value being a list of words with that signature (defaultdict(list) works well for this). Of course, you can do it in any language with sorting abilities, and associative arrays whose values can be lists or vectors;-).

Alex Martelli
In C#:`myStrings.ToLookup(str => new string(str.OrderBy(c => c).ToArray()));`
Ani
@Alex : Using all these sorting techniques will take up more space and time complexity. Is there any elegant solution, that doesn't require sorting them up and then matching each one with all the others.???
Silver Spoon
sorting is elegant
Winston Ewert
@Silver Spoon: I belive Alex's solution has time-complexity `O(k * n * log(n))`. This is pretty good already. I don't believe you could *theoretically* do better `O(k * n)`, and even to approach that, you would have to add some more assumptions.
Ani
@Silver, there's absolutely **no** "matching each one with all the others" in my approach -- that's what the `dict` (or `hashmap` or other associative array data structure), mapping word-signature to list of words with that signature, is all about. **If** you have a **good** hash function (to go from word to word-signature) that's independent from letter-order within word, and somehow guaranteed to (a) never map two non-anagrams to the same signature _and_ (for performance) (b) not produce "too many" accidental hash-coincidences (not trivial!-), you can of course skip the sort-by-letter step.
Alex Martelli
@Ani, I think the only needed "extra assumption" is the one about having that "dream hash function" (but I think it's a biggie since I don't know of such a function that _doesn't_ violate points a and b from my previous comment pretty badly;-). Since words are typically short, the "sort-by-letter" step is truly cheap, anyway!-)
Alex Martelli
@Alex +1 I thought about this problem for a while without looking at your solution and came up with the same thing. I'd be surprised if the time complexity could be improved.
Daniel