tags:

views:

63

answers:

4

I want to find the second <BR> tag and to start the search from there. How can i do it using regular expressions?

<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>

alt text

alt text

A: 

The usual solution to this sort of problem is to use a "capturing group". Most regular expression systems allow you to extract not only the entire matching sequence, but also sub-matches within it. This is done by grouping a part of the expression within ( and ). For instance, if I use the following expression (this is in JavaScript; I'm not sure what language you want to be working in, but the basic idea works in most languages):

var string = "<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>";
var match = string.match(/<BR>.*?<BR>([a-zA-Z]*)/);

Then I can get either everything that matched using match[0], which is "<BR>like <BR>Abdurrahman", or I can get only the part inside the parentheses using match[1], which gives me "Abdurrahman".

Brian Campbell
are you sure this is working properly?
uzay95
I'm not sure exactly what you are looking for. You might want to clarify your question. This shows you how to find two `<BR>` tags, followed by whatever else you put in the parentheses. For instance, if you are looking for "Father", the search would be `<BR>.*?<BR>.*(Father)`, and the first substring match would refer to where it found `Father`. http://rubular.com/regexes/12836
Brian Campbell
A: 

assuming you are using PHP, you can split your string on <BR> using explode

$str='<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>';
$s = explode("<BR>",$str,3);
$string = end($s);
print $string;

output

$  php test.php
Abdurrahman<BR><SMALL>Fathers Name</SMALL>

you can then use "$string" variable and do whatever you want.

The steps above can be done with other languages as well by using the string splitting methods your prog language has.

ghostdog74
+1  A: 

Prepend <BR>[^<]*(?=<BR>) to your regex, or remove the lookahead part if you want to start after the second <BR>, such as: <BR>[^<]*<BR>.

Find text after the second <BR> but before the third: <BR>[^<]*<BR>([^<]*)<BR>

This finds "waldo" in <BR>404<BR>waldo<BR>.

Note: I specifically used the above instead of the non-greedy .*? because once the above starts not working for you, you should stop parsing HTML with regex, and .*? will hide when that happens. However, the non-greedy quantifier is also not as well-supported, and you can always change to that if you want.

Roger Pate
Note that `<BR>[^<]*<BR>` is not the same as `<BR>.*?<BR>`.
Gumbo
Very good answer. Thank you but i want to ask 1 more question. This is very good >[^<]* generates this result '>like' . But i want to remove '>' tag from the result. So i just want to have 'like' result. How can i do this?
uzay95
@Gumbo, but they have same result.
uzay95
uzay95: I don't understand what you mean.
Roger Pate
uzay95: No, they are different, and I believe you should use what I answered, for the stated reason.
Roger Pate
@Roger Pate, first i've edited my first comment to express myself better so that i can get "like" word. And could you please tell why they are different?
uzay95
uzay95: I still don't understand what you mean. Could you give example input. actual behavior, and desired behavior? --- They are different when you try to parse HTML, such as this input: `<BR>abc<P>def<BR>ghi</P>jkl<BR>`.
Roger Pate
Look, this my target string: <BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>and when I write " >[^<]* " the result is equal to : '>like'As you can see, it includes an undesired character which is " > ". I don't want that. All I am asking is, where am I making a mistake? How can I get my code to just get the word "like" and nothing else.
uzay95
To get "like " and nothing else from `<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>`, use: `<BR>([^<]*)<BR>`.
Roger Pate
To get "Abdurrahman" from `<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>`, use: `<BR>[^<]*<BR>([^<]*)<BR>`.
Roger Pate
Roger, I really thank you for your patient comments. I've just tried the code you suggested and it seems to return/highlight/include the <BR> and </BR> codes as well. So, I was trying to get rid of ">" character but now I have even more to get rid of. So unfortunately it didn't do what I wanted it to do. I apologize for repeating it again and again but isn't there a way to just highlight the word "like" ?
uzay95
There's multiple levels of matching, your program is showing the complete matched text, while you're interested in the first group here (the part between the parentheses); get your program to show you the difference between those.
Roger Pate
A: 

this regular expression should math the first two <br />s:

/(\s*<br\s*/?>\s*){2}/i

so you should either replace them with nothing or use preg_match or RegExp.prototype.match to extract the arguments.

In JavaScript:

var afterReplace = str.replace( /(\s*<br\s*\/?>\s*){2}/i, '' );

In PHP

$afterReplace = preg_replace( '/(\s*<br\s*\/?>\s*){2}/i', '', $str );

I'm only sure it'll work in PHP / JavaScript, but it should work in everything...

Dan Beam
Would you tell me please what is the meaning of this reges '/(\s*<br\s*/?>\s*){2}/i' I just want to learn.
uzay95
Dan: That won't match given input text of `<br>anything here<br>`, because you don't allow for anything but `\s` between the tags.
Roger Pate
to explain /(\s*<br\s*/?>\s*){2}/i/ # start regex( # start group\s # whitespace* # any number of previous (inc. zero)<br # literally this text/ # literal (should really be //)? # zero or one of the previous> # literal\s # whitespace* # zero or more of the previous) # end group{2} # 2 of the group/ # end regexi # match non-case sensitively(sorry my spacing is lost)
ternaryOperator