ansaurus

Question

Answer 1

A:

there is so little info on how you get those urls...please show more info next time. are there parameters in the url etc etc... Meanwhile, just simple string manipulation for your sample url

eg

$ s="http://example.com/index.php"
$ echo ${s/%/*}  #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}  
$ echo ${s/#http:\/\//} # get rid of http://
example.com

other ways, using sed(GNU)

$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com

use awk

$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com

ghostdog74 2010-03-23 02:43:03

Your method doesn't work!echo http://example.com/index.php | sed -r 's/http:\/\/|\///g'gives outputexample.comindex.phpand NOTexample.comon cygwin. please post a method that works

Ben Smith 2010-03-23 03:11:40

my method doesn't work because your sample url is different !! and you did not provide more info on what type of urls you want to parse !!. you should write your question clearly providing input examples and describe what output you want next time!

ghostdog74 2010-03-23 03:31:30

Answer 2

+1 A:

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];

if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
  print $2;
}

Usage:

./test.pl 'https://example.com'
example.com

./test.pl 'https://www.example.com/'
www.example.com

./test.pl 'example.org/'
example.org

 ./test.pl 'example.org'
example.org

./test.pl 'example'  -> no output

And if you just want the domain and not the full host + domain use this instead:

#!/usr/bin/perl -w
use strict;

my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
  print $3;
}

2010-03-23 03:47:57

Of course the last one doesn't know about "www.example.co.uk" http://search.cpan.org/~nmelnick/Domain-PublicSuffix-0.04/lib/Domain/PublicSuffix.pm

Dennis Williamson 2010-03-23 07:03:41

True, and if there is an API for it obviously I'd go with that anyway. Seems like the complete solution would actually have to know all valid country codes and check to see if the last post-dot region was a country code...

2010-03-23 13:56:32

Answer 3

A:

Instead of using regex to do this you can use python's urlparse:

 URL=http://www.example.com

 python -c "from urlparse import urlparse
 url = urlparse('$URL')
 print url.netloc"

You could either use it like this or put it in a small script. However this still expects a valid scheme identifier, looking at your comment your input doesn't necessarily provide one. You can specify a default scheme, but urlparse expects the netloc to start with '//' :

url = urlparse('//www.example.com/index.html','http')

So you will have to prepend those manually, i.e:

 python -c "from urlparse import urlparse
 if '$URL'.find('://') == -1 then:
   url = urlparse('//$URL','http')
 else:
   url = urlparse('$URL')
 print url.netloc"

Garns 2010-03-23 10:31:20

Answer 4

A:

The following will output "example.com":

URI="http://[email protected]/foo/bar/baz/?lala=foo" 
ruby -ruri -e "p URI.parse('$URI').host"

For more info on what you can do with Ruby's URI class you'd have to consult the docs.

Michael Kohl 2010-03-24 09:26:06

Answer 5

+1 A:

$ URI="http://user:[email protected]:80/"
$ echo $URI | sed -e "s/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/"
example.com

see http://en.wikipedia.org/wiki/URI_scheme

2010-03-24 09:52:18

Answer 6

A:

basename "http://example.com"

Now of course, this won't work with a URI like this: http://www.example.com/index.html but you could do the following:

basename $(dirname "http://www.example.com/index.html")

Or for more complex URIs:

echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3

-d means "delimiter" and -f means "field"; in the above example, the third field delimited by the forward slash '/' is www.example.com.

musashiXXX 2010-03-29 19:34:05

Answer 7

A:

With Ruby you can use the Domainatrix library / gem

http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html

require 'rubygems'
require 'domainatrix'
s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2'
url = Domainatrix.parse(s)
url.domain
=> "kku"

great tool! :-)

Tilo 2010-04-22 00:28:05

ansaurus

tags:

views:

answers:

extract domain name from url

related questions