views:

93

answers:

6

What would be the best way to compare big paragraphs of text in order to tell the differences apart. For example string A and string B are the same except for a few missing words, how would I highlight these?

Originally I thought of breaking it down into word arrays, and comparing the elements. However this breaks down when a word is deleted or inserted.

+3  A: 

Use a diff algorithm.

Mitch Wheat
A: 

You want to look into Longest Common Subsequence algorithms. Most languages have a library which will do the dirty work for you, and here is one for C#. Searching for "C# diff" or "VB.Net diff" will help you find additional libraries that suit your needs.

Darth Android
+1  A: 

I saw this a few months back when I was working on a small project, but it might set you on the right track.

http://www.codeproject.com/KB/recipes/DiffAlgorithmCS.aspx

kyndigs
i used it, it is nice
Andrey
A: 

Usually text difference is measured in terms of edit distance, which is essentially the number of character additions, deletions or changes necessary to transform one text into the other.

A common implementation of this algorithm uses dynamic programming.

BrokenGlass
A: 

Here is an implementaion of a Merge Engine that compares 2 html files and shows the highlighted differences: http://www.codeproject.com/KB/string/htmltextcompare.aspx

Mark Redman
A: 

If it's a one-shot deal, save them both in MS Word and use the document compare function.

Beth
Nah this'll be a recurring thing thats needs to happen in code behind on an ASP.net webapp
m.edmondson