views:

322

answers:

5

I have the following very simple Javascript-compatible regular expression:

<script type="text/javascript" id="(.+)" src="([^"]+)">

I am trying to match on script tags and gather both the ID and src attributes. I'd like to make the order of the attributes irrelevant, so that the following will still match:

<script id="..." type="text/javascript" src="...">
<script src="..." id="..." type="text/javascript">
<script id="..." src="..." type="text/javascript">

Is it possible to allow the attributes to appear in any order without compromising its ability to collect the matching ID and src?

edit The string to match on is coming from innerHTML, making DOM traversal impossible. Also, I cannot use any third party libraries for this specific application.

A: 

Try the following:

<script\s*\S*\s*(id="([^"]+)")?\s*\S*\s*(src="([^"]+)")\s*\S*\s*(id="([^"]+)")?[^>]*>

since you don't care about the type, just remove it because it makes things more complicated. And then just brute force the rest by adding two optional ID's on either sice of the src.

Alternatively you can do

<script\s*(([^=]*)="([^"]*)")+\s*>

To get all the attributes and then pick out the ones you want in code.

Nick Berardi
This solution seems the most promising, but I can't seem to get the javascript regexp parser to match on it. Still trying...
giltotherescue
+1  A: 

That sounds like a nasty regex. IMO, you might be better off using xpath to query the DOM. Or, you could also use the jquery javascript library to select the elements you need.

Kevin Tighe
+1  A: 

You can also try the following with jQuery:

$("script").each(function() {
    var src = $(this).attr("src");
    var id = $(this).attr("id");

    alert(id + ": " + src);
});

This will work much better than my script parsing Regex.

Nick Berardi
+1  A: 

If you need to get the script tags of a file, could you not just use document.getElementsByTagName() and then just check (possibly using regex) that the attributes you need is there.

Regex is not a good tool to make parsers (at least not for such forgiving syntaxes as HTML)

Stein G. Strindhaug
+1  A: 

Disclaimer: Be careful with regular expressions and HTML source code. It's brittle and therefore easily broken or circumvented, you should not even think of using it to validate user input.

If you are sincere of the source data and know it conforms to the rules of well-formed HTML, you can use this:

var html = "variable/property holding your html source";
var re_script = /<script\s.+?>/ig;
var re_id     = /id="(.*?)"/i;
var re_src    = /src="(.*?)"/i;

var scriptTag = null;
while (scriptTag = re_script.exec(html))
{
  var matchId  = re_id.exec(scriptTag);
  var matchSrc = re_src.exec(scriptTag);

  if (matchId && matchSrc)
  {
    var scriptId  = matchId[1];
    var scriptSrc = matchSrc[1];
    alert('Found script ID="' + scriptId + '", SRC="' + scriptSrc + '"');
  }
}

Basically, this is what jQuery's $("script").each() would do, just without the jQuery and without needing the DOM.

Tomalak
This is exactly what I was looking for, and yes I can trust the input in this specific application. Thanks Tomalak.
giltotherescue