tags:

views:

150

answers:

3

How would I go about extracting text between 2 html tags using delphi? Here is an example string.

blah blah blah<tag>text I want to keep</tag>blah blah blah

and I want to extract this part of it.

<tag>text I want to keep</tag>

(basically removing all the blah blah blah garbage that comes before and after the <tag> & </tag> strings which I also want to keep.

Like I said, I am sure this is extremely easy for those who know, but I just cannot wrap my head around it at the moment. Thanks in advance for your replies.

+2  A: 

This depends entirely on how your input looks.

Update First I wrote a few solutions for special cases, but after the OP explained a bit more about the details, I had to generalize them a bit. Here is the most general code:

function ExtractTextInsideGivenTagEx(const Tag, Text: string): string;
var
  StartPos1, StartPos2, EndPos: integer;
  i: Integer;
begin
  result := '';
  StartPos1 := Pos('<' + Tag, Text);
  EndPos := Pos('</' + Tag + '>', Text);
  StartPos2 := 0;
  for i := StartPos1 + length(Tag) + 1 to EndPos do
    if Text[i] = '>' then
    begin
      StartPos2 := i + 1;
      break;
    end;


  if (StartPos2 > 0) and (EndPos > StartPos2) then
    result := Copy(Text, StartPos2, EndPos - StartPos2);
end;


function ExtractTagAndTextInsideGivenTagEx(const Tag, Text: string): string;
var
  StartPos, EndPos: integer;
begin
  result := '';
  StartPos := Pos('<' + Tag, Text);
  EndPos := Pos('</' + Tag + '>', Text);
  if (StartPos > 0) and (EndPos > StartPos) then
    result := Copy(Text, StartPos, EndPos - StartPos + length(Tag) + 3);
end;

Sample usage

ExtractTextInsideGivenTagEx('tag',
    'blah <i>blah</i> <b>blah<tag a="2" b="4">text I want to keep</tag>blah blah </b>blah')

returns

text I want to keep

whereas

ExtractTagAndTextInsideGivenTagEx('tag',
    'blah <i>blah</i> <b>blah<tag a="2" b="4">text I want to keep</tag>blah blah </b>blah')

returns

<tag a="2" b="4">text I want to keep</tag>
Andreas Rejbrand
fubar
@fubar: My third and fourth functions can handle tags before/after `<tag>` and even malformatted HTML code.
Andreas Rejbrand
Sorry about the reply I only saw your first example when I made the reply I will text the others out now thx again all.
fubar
The last code you provided worked wonderfully. I did not even think about the tag parameters which in my case I needed to account for and your example handled it perfectly. TYVM,Fubar
fubar
The for loop can be replaced by posex in D7+. But I assume this has been done on purpose.
Marco van de Voort
@Marco van de Voort: Yes, that is possible (and would make the code slightly smaller). I am parsing strings very often, and in most cases the task is far more advanced then the one in this problem, and in such cases the `for` loop is the best and most versatile tool. So it is a reflex of mine always to do things manually, simply because the manual approach is the one that works all the time.
Andreas Rejbrand
+2  A: 

you can build an function using the pos the copy functions.

see this sample.

Function ExtractBetweenTags(Const Value,TagI,TagF:string):string;
var
i,f : integer;
begin
 i:=Pos(TagI,Value);
 f:=Pos(TagF,Value);
 if (i>0) and (f>i) then
 Result:=Copy(Value,i+length(TagI),f-i-length(TagF)+1);
end;


Function ExtractWithTags(Const Value,TagI,TagF:string):string;
var
i,f : integer;
begin
 i:=Pos(TagI,Value);
 f:=Pos(TagF,Value);
 if (i>0) and (f>i) then
 Result:=Copy(Value,i,f-i+length(TagF));
end;

and call like this

StrValue:='blah blah blah<tag> text I want to keep</tag>blah blah blah';
NewValue:=ExtractBetweenTags(StrValue,'<tag>','</tag>');//returns 'text I want to keep'
NewValue:=ExtractWithTags(StrValue,'<tag>','</tag>');//returns '<tag>text I want to keep</tag>'
RRUZ
Notice that `(i > 0) and (f > 0) and (f > i)` is equivalent to `(i > 0) and (f > i)`, because `f > 0` follows from this latter conjunction.
Andreas Rejbrand
@Andreas, you are right. updated.
RRUZ
I tried the code above but it returned some unexpected results such as text after the closing tag was still being displayed but thx for the reply and your time, much appreciated.Fubar.
fubar
A: 

If you have Delphi XE, you can use the new RegularExpressions unit:

ResultString := TRegEx.Match(SubjectString, '(?si)<tag>.*?</tag>').Value;

If you have an older version of Delphi, you can use a 3rd party regex component such as TPerlRegEx:

Regex := TPerlRegEx.Create(nil);
Regex.RegEx := '(?si)<tag>.*?</tag>';
Regex.Subject := SubjectString;
if Regex.Match then ResultString := Regex.MatchedExpression;
Jan Goyvaerts