views:

26

answers:

1

I have the following JavaScript program saved in a file pre.js:

var pre = readFile("method-help.html");
RegExp.multiline = true;
print(/<pre>((?:.|\s)+)<\/pre>/.exec(pre)[1]);

The contents of method-help.html is simply the page at http://api.stackoverflow.com/1.0/help/method?method=answers/%7bid%7d. What I'm trying to do is get the JSON code in between the pre tags. However, when I run the program in Rhino, nothing is printed out and the program does not terminate. The command I use is:

java -jar js.jar pre.js

My Rhino version is 1_7R2.

+2  A: 

The reason it doesn't seem to terminate is probably catastrophic back-tracking due to . and \s overlapping (it would end eventually, but it could be a long time). Here's a correct, fast, version:

var pre = readFile("method-help.html");
print(/<pre>([\s\S]*?)<\/pre>/.exec(pre)[1])

You don't need multiline. That only affects the meaning of ^ and $, which you're not using. However, we do use \s\S to mean all characters (including newline, etc.). We also use *? to mean zero or more characters, non-greedy. The question mark (non-greedy) doesn't matter here but it would if there were multiple pre blocks.

Matthew Flaschen