views:

507

answers:

2

What I need to do is compare two strings and mark the differences with begining/ending marks for changes. Example:

this is string number one.
this string is string number two.

output would be

this [is|string is] string number [one|two].  

I've been trying to figure this out for some time now. And I found something I blieved would help me do this, but I am unable to make this happen.
http://www.angusj.com/delphi/textdiff.html

I have it about 80% working here, but I've got no idea how to get it to do exactly what I want it to do. Any help would be appreciated.


uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, Diff, StdCtrls;

type
  TForm2 = class(TForm)
    Edit1: TEdit;
    Edit2: TEdit;
    Button1: TButton;
    Memo1: TMemo;
    Diff: TDiff;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form2: TForm2;
  s1,s2:string;

implementation

{$R *.dfm}

procedure TForm2.Button1Click(Sender: TObject);
var
  i: Integer;
  lastKind: TChangeKind;

  procedure AddCharToStr(var s: string; c: char; kind, lastkind: TChangeKind);
  begin
    if (kind  lastkind) AND (lastkind = ckNone) and (kind  ckNone) then s:=s+'[';
    if (kind  lastkind) AND (lastkind  ckNone) and (kind = ckNone) then s:=s+']';
    case kind of         
      ckNone: s := s  + c;
      ckAdd: s := s + c;         
      ckDelete: s := s  + c;
      ckModify: s := s + '|' + c;
    end;
  end;

begin
  Diff.Execute(pchar(edit1.text), pchar(edit2.text), length(edit1.text), length(edit2.text));

  //now, display the diffs ...
  lastKind := ckNone;
  s1 := ''; s2 := '';
  form2.caption:= inttostr(diff.Count);
  for i := 0 to Diff.count-1 do
  begin

    with Diff.Compares[i] do
    begin
      //show changes to first string (with spaces for adds to align with second string)
      if Kind = ckAdd then 
      begin 
        AddCharToStr(s1,' ',Kind, lastKind); 
      end
      else 
      AddCharToStr(s1,chr1,Kind,lastKind);
      if Kind = ckDelete then 
      begin 
        AddCharToStr(s2,' ',Kind, lastKind)
      end
      else AddCharToStr(s2,chr2,Kind,lastKind);

      lastKind := Kind;
    end;
  end;
    memo1.Lines.Add(s1);
    memo1.Lines.Add(s2);
end;

end.

I took the basicdemo1 from angusj.com and modified it to get this far.

+2  A: 
PhiS
I found an example of this at http://www.mx-dev.net/delphi/sources/source-6-alignement-de-sequences-d-adn.zip (requires login, doesn't check your email address to be valid) but I don't understand the algorithm/code enough to modify it. :( (not to mention comments are in French or Spanish)
Brad
If you are not concerned with finding optimal solutions, and/or want to present several alternatives, you could simply evaluate all possible char x char combinations of the two source strings [graphically, this corresponds to a dot-plot: http://en.wikipedia.org/wiki/Dot_plot_(bioinformatics) ]. Here, (graphically) the diagonals correspond to the matching substrings and you can map the start and stop positions of these.
PhiS
Sorry, don't seem to be able to access the code in your link. But have a look at the Needleman–Wunsch wikipedia article. It has pseudo-code and links to functioning code in other languages. (Don't worry, for strings encoding e.g. English, you won't have to bother with complicated score matrices.)
PhiS
A: 

There is an Object Pascal Diff Engine available which might be of assistance. You might want to break each "word" into a separate line for the comparison, or modify the algorithm to compare on a word by word basis.

skamradt
I will look into this again, when I tried to work with it's demo on Torry it crashed with divide by zero errors, so I didn't go any further.
Brad