tags:

views:

62

answers:

2

I'm looking for a way to find/replace links to images (within user-generated content) without touching links to non-images.

For example, the following text:

<a href="http://domain.com/arbitrary-file.jpg"&gt;Text&lt;/a&gt;
<a href="http://domain.com/arbitrary-file.jpeg"&gt;Text&lt;/a&gt;
<a href="http://domain.com/arbitrary-path/arbitrary-file.gif"&gt;Text&lt;/a&gt;
<a href="http://domain.com/arbitrary-file.png"&gt;Text&lt;/a&gt;

<a href="http://domain.com/arbitrary-file.html"&gt;Text&lt;/a&gt;
<a href="http://domain.com/arbitrary-path/"&gt;Text&lt;/a&gt;
<a href="http://domain.com/arbitrary-file#anchor_to_here"&gt;Text&lt;/a&gt;

Non-hyperlinked URL: http://domain.com/arbitrary-path/arbitrary-file.gif
Non-hyperlinked URL: http://domain.com/arbitrary-file#anchor_to_here

... should be revised to:

<img src="http://domain.com/image.jpg" alt="Text" />
<img src="http://domain.com/arbitrary-file.jpeg" alt="Text" />
<img src="http://domain.com/arbitrary-path/arbitrary-file.gif" alt="Text" />
<img src="http://domain.com/arbitrary-file.png" alt="Text" />

<a href="http://domain.com/arbitrary-file.html"&gt;Text&lt;/a&gt;
<a href="http://domain.com/arbitrary-path/"&gt;Text&lt;/a&gt;
<a href="http://domain.com/arbitrary-file.html#anchor_to_here"&gt;Text&lt;/a&gt;

Non-hyperlinked URL: http://domain.com/arbitrary-path/arbitrary-file.gif
Non-hyperlinked URL: http://domain.com/arbitrary-file#anchor_to_here

... securely and reliably in PHP.

+2  A: 

There's no reliable way to do this, not at least with regular expressions, but this should do the trick nevertheless:

$str = preg_replace('~<a[^>]*?href="(.*?(gif|jpeg|jpg|png))".*?</a>~', '<img src="$1" />', $str);

To open this up a bit:

  • Find opening <a tags
  • Find the href attribute inside that tag
  • Get the href if it ends with one of the listed file extensions and a " character
  • Include the rest of the link until the closing </a> tag in the replace
  • Replace the whole match with an img element that gets the href as a src attribute

As Bauer noted, you could be better off using DOM methods. But if you can be sure your links are always in this format, you can use regular expressions. Regex might be a bit faster also.

Tatu Ulmanen
+1 - Thanks! This also matched `http://domain.com/all-about-png`, so I ended up using `preg_replace('#(<a href="(.*?(\.jpg|\.jpeg|\.png|\.gif|\.tif|\.tiff|\.JPG|\.JPEG|\.PNG|\.GIF|\.TIF|\.TIFF))".*?</a>)#e', "'<img src=\"$2\" alt=\"\" />'", $str);` based on this answer for now, because the original `<a>` is being (reliably?) generated by another system... based on user-submitted content.
Dolph
+5  A: 

You might want to look at using a HTML parser (rather than regular expressions, as you tagged the submission) such as the PHP Simple HTML DOM Parser. This would provide you with the reliability you speak of.

You'll probably end up with something like this:

foreach($html->find('a') as $element)
{
    echo '<img src="'.$element->href.'" alt="'.$element->innertext.'" />';
}
Bauer
+1 - was typing an answer almost identical to this. Make sure you test for the correct `href` attribute (i.e. ends with .jpg, .gif, etc.) before you blindly convert it to an image. Alternatively you could also change the selector to `find('a[href$=jpg], a[href$=gif]')`.
John Rasch
+1 - This is probably the best answer to this question for most people, but I believe the input links I'm working with are consistent enough to avoid this additional complexity. Thank you for the insight!
Dolph