views:

221

answers:

3

Hey Guys

I am trying to remove all the relative image path slashes from a chunk of HTML that contains several other elements.

For example

<img src="../../../../images/upload/1/test.jpg />

would need to become

<img src="http://s3.amazonaws.com/website/images/upload/1/test.jpg" />

I was thinking of writing this as a rails helper, and just passing the entire block into the method, and make using Nokogiri or Hpricot to parse the HTML instead, but I don't really know.

Any help would be great

Cheers Adam

+1  A: 

This chunk might help:

html = '<img src="../../../../images/upload/1/test.jpg />'
absolute_uri = "http://s3.amazonaws.com/website/images"
html.gsub(/(\.\.\/)+images/, absolute_uri)
Milan Novota
Of course, this only works if all of the images are under the same path and we know this path beforehand.
Arkku
+3  A: 

One way to construct an absolute path given the absolute URL of the page and a relative path found on that page:

pageurl = 'http://s3.amazonaws.com/website/foo/bar/baz/quux/index.html'
relative = '../../../../images/upload/1/test.jpg'
absolute = pageurl.sub(/\/[^\/]*$/, '')
relative.split('/').each do |d|
  if d == '..'
    absolute.sub!(/\/[^\/]*$/, '')
  else
    absolute << "/#{d}"
  end
end
p absolute

Alternatively, you could cheat a bit:

'http:/'+File.expand_path(File.dirname(pageurl.sub(/^http:/, ''))+'/'+relative)
Arkku
+3  A: 

No need to reinvent the wheel, when the builtin 'uri' lib can do that for you:

require 'uri'
main_path = "http://s3.amazonaws.com/website/a/b/c"
relative_path = "../../../../images/upload/1/test.jpg"

URI.join(main_path, relative_path).to_s
  # ==> "http://s3.amazonaws.com/images/upload/1/test.jpg"
Marc-André Lafortune
Handy that. I thought you'd have to use URI.parse(...).path and some File.expand_path to do this.
tadman
URI.join() is how I do it all the time. As an alternate to URI, Addressable::URI is a nice module because it is a bit more full-featured, especially if you have to work with IDNA-type URLs. http://en.wikipedia.org/wiki/Internationalized_domain_name
Greg