How do I extract the domain name from a url using bash? like: http://example.com/ to example.com must work for any tld, not just .com
there is so little info on how you get those urls...please show more info next time. are there parameters in the url etc etc... Meanwhile, just simple string manipulation for your sample url
eg
$ s="http://example.com/index.php"
$ echo ${s/%/*} #get rid of last "/" onwards
http://example.com
$ s=${s/%\//}
$ echo ${s/#http:\/\//} # get rid of http://
example.com
other ways, using sed(GNU)
$ echo $s | sed 's/http:\/\///;s|\/.*||'
example.com
use awk
$ echo $s| awk '{gsub("http://|/.*","")}1'
example.com
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]+\.[^\/]+)/g) {
print $2;
}
Usage:
./test.pl 'https://example.com'
example.com
./test.pl 'https://www.example.com/'
www.example.com
./test.pl 'example.org/'
example.org
./test.pl 'example.org'
example.org
./test.pl 'example' -> no output
And if you just want the domain and not the full host + domain use this instead:
#!/usr/bin/perl -w
use strict;
my $url = $ARGV[0];
if($url =~ /([^:]*:\/\/)?([^\/]*\.)*([^\/\.]+\.[^\/]+)/g) {
print $3;
}
Instead of using regex to do this you can use python's urlparse:
URL=http://www.example.com
python -c "from urlparse import urlparse
url = urlparse('$URL')
print url.netloc"
You could either use it like this or put it in a small script. However this still expects a valid scheme identifier, looking at your comment your input doesn't necessarily provide one. You can specify a default scheme, but urlparse expects the netloc to start with '//'
:
url = urlparse('//www.example.com/index.html','http')
So you will have to prepend those manually, i.e:
python -c "from urlparse import urlparse
if '$URL'.find('://') == -1 then:
url = urlparse('//$URL','http')
else:
url = urlparse('$URL')
print url.netloc"
The following will output "example.com":
URI="http://[email protected]/foo/bar/baz/?lala=foo"
ruby -ruri -e "p URI.parse('$URI').host"
For more info on what you can do with Ruby's URI class you'd have to consult the docs.
$ URI="http://user:[email protected]:80/"
$ echo $URI | sed -e "s/[^/]*\/\/\([^@]*@\)\?\([^:/]*\).*/\2/"
example.com
basename "http://example.com"
Now of course, this won't work with a URI like this: http://www.example.com/index.html
but you could do the following:
basename $(dirname "http://www.example.com/index.html")
Or for more complex URIs:
echo "http://www.example.com/somedir/someotherdir/index.html" | cut -d'/' -f3
-d means "delimiter" and -f means "field"; in the above example, the third field delimited by the forward slash '/' is www.example.com.
With Ruby you can use the Domainatrix library / gem
http://www.pauldix.net/2009/12/parse-domains-from-urls-easily-with-domainatrix.html
require 'rubygems' require 'domainatrix' s = 'http://www.champa.kku.ac.th/dir1/dir2/file?option1&option2' url = Domainatrix.parse(s) url.domain => "kku"
great tool! :-)