tags:

views:

57

answers:

2

I'm trying to parse a URI from user input. I'm assuming some users won't put the scheme in their URI's and I wan't to default to "http"

I wrote the following code which I thought should work. But it dosen't.

require 'uri'   

uri_to_check = URI::parse("www.google.com")
uri_to_check.scheme = "http" unless uri_to_check.scheme

puts uri_to_check.to_s

I expect to see "http://www.google.com" but I get "http:www.google.com". Is it even possible to do it this way?

If so, what am I missing?

Is there a better way to do this?

A: 

When the string you want to be parsed doesn't conatin a scheme, URI doesn't recognize it as a hostname:

irb(main):001:0> require 'uri'
=> true
irb(main):002:0> uri = URI::parse("www.google.com")
=> #<URI::Generic:0x11cfc88 URL:www.google.com>
irb(main):003:0> uri.path
=> "www.google.com"
irb(main):004:0> uri.host
=> nil

When you set the scheme as you do in your example and then call to_s the URI is build without the host...

You can try something like the following: (That's a quick hack, I don't know the URI details...)

uri = URI::parse("www.google.com")
if uri.scheme.nil? && uri.host.nil?
  unless uri.path.nil?
    uri.scheme = "http"
    uri.host = uri.path
    uri.path = ""
  end
end

puts uri.to_s
Scolytus
Thanks! Fixed the example. A little copy/paste issue ;)
maz
+3  A: 

The leading slashes (//) indicate that the URL is an IP-based address, and are needed to flag the hostname so URI can parse them correctly.

Wikipedia has some good overviews and examples of use:

http://en.wikipedia.org/wiki/Url , http://en.wikipedia.org/wiki/URI_scheme , http://en.wikipedia.org/wiki/URL_normalization

The best information is in the spec itself: http://www.ietf.org/rfc/rfc1738.txt particularly in section 3.1 "3.1. Common Internet Scheme Syntax".

You might want to consider using the Addressable gem. It's smarter and is what I use when I need to do a lot of URI parsing or manipulation.

http://addressable.rubyforge.org/ and http://addressable.rubyforge.org/api/Addressable/URI.html

Greg
+1. For your particular use case, you might want to look at `Addressable::URI.heuristic_parse`, which is *specifically* intended for the case, where some information is genuinely missing from the URI rather than intentionally left out.
Jörg W Mittag
Addressable::URI.heuristic_parse is exactly what I'm looking for. Thank you!
maz