views:

302

answers:

5

Hi. I'm using file_get_contents to get a certain files contents. So far that is working. But then i would want to search the file and replace all <a href=" with <a href="site.php?url= before showing the file. How can i do this? I know i should use some kind of str_replace or even preg_replace. But I don't know how to actually search and do it for the file i'm getting with file_get_contents.

Thank you for your help!

+2  A: 
$text = file_get_contents('some_file');
$text = str_replace('<a href="', '<a href="site.php?url=', $text);
chaos
Watch out for case sensitivity and more complex cases like '<a href="'(two spaces). If these are relevant, you may want to consider using str_ireplace, regular expressions or more complex parsing solutions.
luiscubal
Also, you'll need to urlencode() the url in the new href otherwise you'll probably end up with other problems.
Darryl Hein
+1  A: 

file_get_contents returns a string containing the file's content.

So, you can work in this string using whichever string manipulation function you'd want, like the ones you talked about.

Something like this, using str_replace, would probably do :

$content = file_get_contents('http://www.google.com');

$new_content = str_replace('<a href="', '<a href="site.php?url=', $content);

echo $new_content;

But note it will only replace the URL in the href attribute when that attribute is the first one of the <a tag...

Using a regex might help you a bit more ; but it probably won't be perfect either, I'm afraid...

If you are working with an HTML document and want a "full" solution, using DOMDocument::loadHTML and working with DOM manipulation methods might be another (a bit more complex, but probably more powerful) solution.


The answers given to those two questions might also be able to help you, depending on what you are willing to do :


EDIT after seeing the comment :

If you want to replace two strings, you can pass arrays to the two first parameters of str_replace. For instance :

$new_content = str_replace(
    array('<a href="', 'Pages'), 
    array('<a href="site.php?url=', 'TEST'), 
    $content);

With that :

  • '<a href="' will be replaced by '<a href="site.php?url='
  • and 'Pages' will get replaced by 'TEST'

And, quoting the manual :

If search and replace are arrays, then str_replace() takes a value from each array and uses them to do search and replace on subject . If replace has fewer values than search , then an empty string is used for the rest of replacement values. If search is an array and replace is a string, then this replacement string is used for every value of search .

If you want to replace all instances of '<a href="', well, it's what str_replace does by default :-)

Pascal MARTIN
I have one more question. Though this was a very good answer that helped me on the way. If i would want to do 2 replacements, how would that work? 2 replacements at one time.
I editing my answer with a couple more informations, that might help you with that :-) Have fun!
Pascal MARTIN
A: 

If you want to use the remote document on your website but keep the links of that document intact, better use the BASE element to declare the base URI:

<base href="http://example.com/path/to/remote/document"&gt;
Gumbo
A: 
$new_content = preg_replace('!(<a\s*[^>]*)href="([^"]+)"!','\1 href="site.php?url=\2"', $content);

I think this should do the trick:

  • it replaces the href of a link, no matter where it is located
  • e.g. works on <a href=".." , <a style="" href="..."
bisko
HTML attribute values may contain plain `>` characters.
Gumbo
I've been parsing quite a lot of websites with regular expressions, never stumbled upon a < or > in an attribute value. Could you show me an example if you could recall any?
bisko
A: 

Like the code sent by bisko but, no matter about the enclose ', " or nothing in href

$text = '<a href="http://www.europanet.com.br"&gt;Europanet&lt;/a&gt;     <a target="_blank" href=\'http://www.webjump.com.br\'&gt;Webjump&lt;/a&gt;
<a id="link" href=http://www.euforia.com.br target="_top">Euforia</a>';
$text = preg_replace('|(<a\s*[^>]*href=[\'"]?)|','\1site.php?url=', $text);
Yes, that's awesome! But how about doing the same with img-tags? Is it just to change the a and href in the preg_replace. And how about doing all that in one? With img-tags AND href-tags! Thank u for ur answer.