views:

92

answers:

3

How would I go about taking input html and changing any src or href links that go to a local adress (e.g. href="index.html" to their full location (specified) e.g. href="http://www.somesite.com/index.html") this is for a site that gets a file from another site and displays it (kinda like a proxy)

+5  A: 

Take a look at the <base> tag. It lets you define where all links are relative to.

nickf
Thank you, can't beleive I didn't think of that :P :)
Joe
The <base> tag is useful only if you don't plan on changing file names and paths. Using DOM functions (like in my answer) allows you to manipulate href and src attributes in any way.
Felix
+1  A: 

If you are doing this for random HTML pages that are not necessarily strict, regexps will be a huge headache for you, because you'll have to handle non-standard attributes like:

href="some_url"
href='some_url'
href=some_url

My advice is to use DOM functions for this task. You could do something amongst these lines (untested):

$doc = new DOMDocument();
@$doc->loadHTMLFile($url); // suppress warnings about html errors
$xpath = new DOMXpath($doc);
$hrefs = $xpath->query("//*[@href]/@href"); // select the href attribute of all elements that have a href attribute
for ($i=0; $i < $hrefs->length; $i++) {
    $href = $hrefs->item($i);
    $href->nodeValue = make_new_url($href->nodeValue); // this is where the magic happens
}
// now do the same for src attributes

Again, this code might need some tweaking, especially the XPath query, not very sure about it.

Using the DOM extension might seem overly complex for the task at hand, but it will spare you a lot of headaches and time, on this task and future ones, too.

Felix
Parsing the HTML, and acting on it is definitely the best option for manipulating HTML. Even better, there's a great library that does exactly that: htmlpurifier. HTMLPurifier even already has a config option that does exactly what the OP asked for: http://htmlpurifier.org/live/configdoc/plain.html#URI.MakeAbsolute
Frank Farmer
Yes, but why use a third-party tool when you can use a first-party one just as easy? On top of that, DOM can prove very useful in a lot of other situations, too.
Felix
A: 

**You dont need any regular expression for this problem, ** $_SERVER['HTTP_HOST']

$cur_dir = basename(dirname($_SERVER['PHP_SELF']));
$host = $_SERVER['HTTP_HOST'];
echo $host."/".$cur_dir."/"$filename;

this will print http://www.yourdomain.blabla/your/images/index.html

streetparade
could pls the one who voted down comment why he voted down? it would be very help full that i can learn from my mistakes ;-)
streetparade