tags:

views:

102

answers:

4

Relatively new to php and looking for some help in updating links on a specific page. The page has numerous links eg. href=/link/ and I would like to code the page to identify these links (links that do not already have http or https) and prepend with a url eg. www.domain.com to each. Basically ending up with href=www.domain.com/link/. Any help would be greatly appreciated.

A: 

I think you want to parse a list of URLs and prepend "http://" to the ones that don't have it.

<?php
$links = array('http://www.redditmirror.cc/', 'phpexperts.pro', 'https://www.paypal.com/', 'www.example.com');

foreach ($links as &$link)
{
    // Prepend "http://" to any link missing the HTTP protocol text.
    if (preg_match('|^https*://|', $link) === 0)
    {
        $link = 'http://' . $link . '/';
    }
}

print_r($links);

/* Output:

Array
(
    [0] => http://www.redditmirror.cc/
    [1] => http://phpexperts.pro/
    [2] => https://www.paypal.com/
    [3] => http://www.example.com/
)
*/
hopeseekr
thanks hopeseeker, but doesnt that just match what is in the $links array eg. http://www.redditmirror.cc/, etc instead of searching the page for any links that dont have http or https?
A: 

You could always use output buffering at the top of your page with a callback that reformats your hrefs to how you'd like them:

function callback($buffer)
{
    return (str_replace(' href="/', ' href="http://domain.com/', $buffer));
}

ob_start('callback');

// rest of your page goes here

ob_end_flush();
Nev Stokes
A: 

Because you left out critical details in your first question, here is the second answer.

Doing what @Nev Stokes says may work, but it will also get more than tags. You should never use regular expressions (or, worse, strp_replace) on HTML.

Instead, use the file_get_html() library and do this:

<?php
require 'simplehtmldom/simple_html_dom.php';

ob_start();
?>
<html>
    <body>
      <a id="id" href="/my_file.txt">My File</a>
      <a name="anchor_link" id="boo" href="mydoc2.txt">My Doc 2</a>
      <a href="http://www.phpexperts.pro/"&gt;PHP Experts</a>
    </body>
</html>
<?php
$output = ob_get_clean();
$html = str_get_html($output);

$anchors = $html->find('a');
foreach ($anchors as &$a)
{
    if (preg_match('|^https*://|', $a->href) === 0)
    {
        // Make sure first char is /.
        if ($a->href[0] != '/')
        {
            $a->href = '/' . $a->href;
        }

        $a->href = 'http://www.domain.com' . $a->href;
    }
}

echo $html->save();

Output:

<html>
    <body>
      <a id="id" href="http://www.domain.com/my_file.txt"&gt;My File</a>
      <a name="anchor_link" id="boo" href="http://www.domain.com/mydoc2.txt"&gt;My Doc 2</a>
      <a href="http://www.phpexperts.pro/"&gt;PHP Experts</a>
    </body>
</html>
hopeseekr
+1  A: 

Maybe it suffices to just change the base URI of the document with the BASE element:

<base href="http://example.com/link/"&gt;

With this the new base URI is http://example.com/link/ instead of the URI of the document. That means, every relative URI is resolved from http://example.com/link/ instead of the document’s URI.

Gumbo
I was just about to put this up when I saw it pop up.
Joel Etherton