tags:

views:

102

answers:

2

I use webBrowser.DocumentText to get the html code of a page. using Regex, i manage to get the script tag part.
< script type="text/javascript">functions here..< /script>

I need to get functions inside those tags. ex.

<script type="text/javascript">
 function function1 () { code here;}
 function function2 () { code here;} 
<br>
</script>

I need regex pattern to get the 2 functions
or list them down like this
1. function funtion1() { code here; }
2. function funtion2() { code here; }

purpose of the program is to identify if there's a duplicate javascript functions between 2 pages.
Its for winForms and language is C#

A: 
e = ".*?(function.+?{.*?}|\\z)";
repl = "\\1";

I believe that's it.

Mark
nevermind - did not consider nested "}"
Mark
but is it possible for nested "}"?
Jepe d Hepe
In Javascript, yes. I was assuming the standard syntax.
Mark
But you can't have nested <script tags, so you could just return the content of <script>...</script>
Mark
e = ".*?(<script.*?>(.*?)</script>|\\z)"; repl = "\\2"; But that will leave in any var statements so you could make a second pass to remove anything starting with var and ending with \n. Also I didn't account for spaces in between the starting '<' and script (etc.) for readability
Mark
+1  A: 

You can not do it in any general way with regexes alone (especially not with the .NET flavour), since JavaScript scopes can be nested arbitrarily deeply and the language is therefore irregular. If you need them for a few particular pages, you might be able to craft a regex that handles common cases, but not all.

Max Shawabkeh