I have a page source and i want to get the anchor text of all its anchor tags
Could someone please help me out with the pattern for it.
Thanks in Advance
I have a page source and i want to get the anchor text of all its anchor tags
Could someone please help me out with the pattern for it.
Thanks in Advance
karim79 is right, regex might be the wrong way, but anyway here is one simple way it could be done in Java. Note that this would not work, if the anchors have aditional attributes before the href. However, this might be a good start or help you understanding how you could do it.
String html = "<body>" +
"<a href=\"#first\">got to first</a>" +
"<span>something else</span>" +
"<a href=\"#second\">got to second</a>" +
"</body>";
Pattern pattern = Pattern.compile("<a href=\"#(\\w+)\">([\\w\\s]+)</a>");
Matcher matcher = pattern.matcher(html);
while(matcher.find()){
System.out.println(matcher.group(2));
}
Try this regex pattern, should give you what you are looking for:
(?<=<\s*a[^>]*>)(?<anchorContent>[\s\S]*?)(?=<\s*/a>)
This will give you a group called "anchorContent"
Hope that helps.