views:

34

answers:

0

Hi,

After processing a dictionary of words I have edit distances (or rather similarity in percent) saved in a data structure, kinda like this:
s1=String1, s2=String2, similarity=82
s1=String2, s2=String3, similarity=82
s1=aaaaaaa, s2=aaaaaab, similarity=90
s1=aaaaaaa, s2=aaaaaac, similarity=95

My aim is to have a list of groups of similar strings i.e. all strings in the group have a similarity between each other > x e.g. {(String1, String2, String3), (aaaaaaa, aaaaaab, aaaaaac)}

Current idea is to go through the data structure identifiying all unique strings and then rerunning the edit distance algorithm against each other... Seems a bit labour intensive though...

Any thoughts? Or would it be possible to do whilst calculating the edit distances the first time around?

Thx. A.