Hello Everybody.
I'm trying to parse links with Regex with Java.
But, i think its getting to slow. For example. To extract all links from:
is spending 34642 milliseconds (34 seconds!!!)
Here is the regexp:
private final String regexp = "<a.*?\\shref\\s*=\\s*([\\\"\\']*)(.*?)([\\\"\\'\\s].*?>|>)";
The flags for the pattern:
private static final int flags = Pattern.CASE_INSENSITIVE | Pattern.DOTALL |Pattern.MULTILINE | Pattern.UNICODE_CASE | Pattern.CANON_EQ;
And the code may be something like this:
private void processURL(URL url){
URLConnection connection;
Pattern pattern = Pattern.compile(regexp, flags);
try {
connection = url.openConnection();
InputStream in = connection.getInputStream();
BufferedReader bf = new BufferedReader(new InputStreamReader(in));
String html = new String();
String line = bf.readLine();
while(line!=null){
html+=line;
line = bf.readLine();
}
bf.close();
Matcher matcher = pattern.matcher(html);
while (matcher.find()) {
System.out.println(matcher.group(2));
}
}catch(Exception e){
}
}
Can you give me a Hint?
Extra Data: 1Mbit Core 2 Duo 1Gb RAM Single Threaded