If I have the string below, how can I extract the EDITORS PREFACE text with java? Thanks.
<div class='chapter'><a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>EDITORS PREFACE</a></div>
If I have the string below, how can I extract the EDITORS PREFACE text with java? Thanks.
<div class='chapter'><a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>EDITORS PREFACE</a></div>
As you wrote in a comment of your question that you want what is within href, using Regex here it is:
<a[^>]*? href=\"(?<url>[^\"]+)\"[^>]*?>
This regex will work with Microsoft .NET Framework. It'll capture the content within href putting it in a group called url.
Just noted that this question is tagged with Java. In Java there's no named group as of JDK 6, so here's the solution for Java:
<a[^>]*? href="([^"]+)"[^>]*?>
The above regex will capture the content within href putting it in group 1.
Test it here: http://www.regexplanet.com/simple/index.html
Run this program:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "<a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>EDITORS PREFACE</a>";
String pattern = "<a[^>]*? href=\'([^\']+)\'[^>]*?>";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( ))
{
// Found value: <a href='page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE'>
System.out.println("Found value: " + m.group(0) );
// Found value: page.php?page=1&filename=SomeFile&chapter=EDITORS PREFACE
System.out.println("Found value: " + m.group(1) );
}
else
{
System.out.println("NO MATCH");
}
}
}