Working with edit distance and then processing the results to find chunks / groups | ansaurus

tags:

views:

34

answers:

0

Q:

Working with edit distance and then processing the results to find chunks / groups

Hi,

After processing a dictionary of words I have edit distances (or rather similarity in percent) saved in a data structure, kinda like this:
s1=String1, s2=String2, similarity=82
s1=String2, s2=String3, similarity=82
s1=aaaaaaa, s2=aaaaaab, similarity=90
s1=aaaaaaa, s2=aaaaaac, similarity=95

My aim is to have a list of groups of similar strings i.e. all strings in the group have a similarity between each other > x e.g. {(String1, String2, String3), (aaaaaaa, aaaaaab, aaaaaac)}

Current idea is to go through the data structure identifiying all unique strings and then rerunning the edit distance algorithm against each other... Seems a bit labour intensive though...

Any thoughts? Or would it be possible to do whilst calculating the edit distances the first time around?

Thx. A.

related questions

Displaying Flash content in a C# WinForms application

How to get the value of built, encoded ViewState?

Unhandled Exception Handler in .NET 1.1

How do I connect to a database and loop over a recordset in C#?

How do I most elegantly express left join with aggregate SQL as LINQ query

Get a new object instance from a Type in C#

.NET Testing Framework Advice

Automatically update version number

What is the difference between an int and an Integer in Java/C#?

How to write to Web.Config in Medium Trust ?

WinForms ComboBox data binding gotcha

How do you sort a C# dictionary by value?

Adding Scripting functionality to .NET applications

Floating Point Number parsing: Is there a Catch All algorithm?

How do I print an HTML document from a web service?

Decoding T-SQL CAST in C#/VB.net

Anatomy of a "Memory Leak"

How do I get a distinct, ordered list of names from a DataTable using Linq

Reliable Timer in a Console Application

How do I fill a DataSet or a DataTable from a LINQ query resultset ?

What's the difference between Math.Floor() and Math.Truncate() in .NET?

How do I calculate relative time?

How do I calculate someone's age in C#?

Are there any conversion tools for porting Visual J# code to C#?

When setting a form's opacity should I use a decimal or double?