tags:

views:

45

answers:

1

i am writing a php script that compress local javascript files into one from an html page. Now i want to delete local references except external javascript files.

for example, i have following variables.

$my_domain = "mysite";
$base_url ="http://www.mysite.com"

I want to delete all local javascript references except external. local javascript include sub-domain too. For Example, http://www.mysite.com/script/jquery.js and http://dev.mysite.com/scripts/test.js is the example of local javascript files.

i want to use regular expression for this.

EDIT: The format is :

 <script src="http://www.mysite.com/jsfile.js"&gt;&lt;/script&gt;

EDIT 2: The script in a page is like:

<script type="text/javascript" src="http://localhost/test/example/Scripts/superfish/js/hoverIntent.js"&gt;&lt;/script&gt;
<script type="text/javascript" src="http://localhost/test/example/Scripts/superfish/js/superfish.js"&gt;&lt;/script&gt;

where $baseURL="http://localhost/test/example";

it is currently not replacing.

A: 

Assuming your base url is mysite.com, this regex:

Search for: (<script\b[^[><]*\ssrc\s*=\s*["'])(?:http:\/\/)?(?:www.)?(?:\w+\.)?\bmysite\.com/?\b([^><"']*["'])

Replace with: $1$2

will make your URL references (can contain any number of domains/subdomains), e.g. from http://www.mysite.com/jsfile.js become jsfile.js

Supposing you want to remove every SCRIPT tags (and their contents) containing such a url, use this regex:

Search for: <script\b[^><]*\ssrc\s*=\s*[^><]*\bmysite\.com\b[^><]*>\s*<\/script>

Replace with: nothing

If there is a possibility that between <script> and </script> can contain any text, use this regex instead:

Search for: <script\b[^><]*\ssrc\s*=\s*[^><]*\bmysite\.com\b[^><]*>.*?<\/script>

Replace with: nothing

So

<?php
$ptn = "/<;script\b[^>;<;]*\ssrc\s*=\s*[^>;<;]*\bmysite\.com\b[^>;<;]*>;.*?<;\/script>;/";
$str = "<;script src="http://www.mysite.com/jsfile.js"&gt;;&lt;;/script&gt;;";
$rpltxt = "";
echo preg_replace($ptn, $rpltxt, $str);
?>
Vantomex
I have optimized the second regex and have added additional note.
Vantomex
Supposing you mentioned the regex library or application you are currently using is, I can even make further optimization.
Vantomex
i have tried this, but not working.$replacePattern = "<script\b[^[><]*\ssrc\s*=\s*[\"'](?:http:\/\/)?(?:www.)?(?:\w+\.)?\bmysite.com/?\b[^><]*>\s*<\/script>"; echo preg_replace($replacePattern, '', $html);
No, I have edited the answer please look once again.
Vantomex
So, you use PHP, this is the last regex `result = subject.replace(/<script\b[^><]*\ssrc\s*=\s*[^><]*\bmysite.com\b[^><]*>.*?<\/script>/g, "");`
Vantomex
this also return nothing. although $html contains whole page html.
Sorry, it was javascript regex. For PHP, it is $result = preg_replace('%<script\b[^><]*\ssrc\s*=\s*[^><]*\bmysite.com\b[^><]*>.*?</script>%', bla bla bla);
Vantomex
@Vantomex plz check edit 2 in question description
If so, just replace `mysite.com` with `localhost`. Maybe there is some character in my regex that should be escaped. I don't know PHP convention to escape forbidden character in regex.
Vantomex
I do test here with RegexBuddy, and it works perfect.
Vantomex
i replaced mysite.com from your regular expression with "localhost" but still it is not removing.
I have edited a bit, and I think it works now for you. I made the last PHP regex using the tool from this website: `http://www.pagecolumn.com/tool/pregtest.htm` You also can put my other regex in this website to convert them into PHP regex. Don't forget to replace mysite.com to localhost it is your case.
Vantomex