views:

229

answers:

3

There is a form field where users should input domain names in the form of "google.com".

However, taking into consideration confused users, I want to be able to clean the input to the exact form of "google.com" in case they type in the following cases:

http://www.google.com
http://google.com
google.com/blah
www.google.com
..and other incorrect forms

What is the best way to accomplish this?

Thanks in advance!

A: 

This is hard. Not only will you have to parse many different forms of URIs, but you'll need to know how to get the TLD from the hostname, using something such as the Public Suffix List, like Firefox does.

Ben Alpert
+4  A: 

You can write simple function that cleans these up with regular expressions:

  def foo(s)
    s.gsub(/^(http:\/\/)?(www\.)?/,'').gsub(/\/.*$/,'')
  end

This works with all the examples you gave. If that is not sufficient, add more test cases:

  def test_foo
    assert_equal 'google.com', foo('http://www.google.com')
    assert_equal 'google.com', foo('http://google.com')
    assert_equal 'google.com', foo('google.com/blah')
    assert_equal 'google.com', foo('www.google.com')
  end
ndp
+5  A: 

You should build your system over add addressable/uri, this gem would take care of URI stuff ( path, host, port ) and you just provide the default scheme which is http.

(gem install addressable).

Example

>> uri = Addressable::URI.parse("http://google.com?q=lolcat")
=> #<Addressable::URI:0x80bcf0e0 URI:http://google.com?q=lolcat&gt;
>> [uri.host,uri.path,uri.scheme]
=> ["google.com", "", "http"]

Basically you have just to detect if http:// is present and add it if it's not the case, because URI would not guess it for you. And it's done, nothing more to handle manually.

jhc_
This is a very interesting and flexible solution. Thank you.
jimsung