Hi,
After processing a dictionary of words I have edit distances (or rather similarity in percent) saved in a data structure, kinda like this:
s1=String1, s2=String2, similarity=82
s1=String2, s2=String3, similarity=82
s1=aaaaaaa, s2=aaaaaab, similarity=90
s1=aaaaaaa, s2=aaaaaac, similarity=95
My aim is to have a list of groups of similar strings i.e. all strings in the group have a similarity between each other > x e.g. {(String1, String2, String3), (aaaaaaa, aaaaaab, aaaaaac)}
Current idea is to go through the data structure identifiying all unique strings and then rerunning the edit distance algorithm against each other... Seems a bit labour intensive though...
Any thoughts? Or would it be possible to do whilst calculating the edit distances the first time around?
Thx. A.