tags:

views:

467

answers:

6

I need to match a host name--but don't want the tld:

example.com =~ /regex/ => example

sub.example.com =~ /regex/ => sub.example

sub.sub.example.com =~ /regex/ => sub.sub.example

Any help with the regex? Thanks.

A: 
(.*)\.

This isn't really specific to tlds, it'll just give you everything before the last period in a line. If you want to be strict about valid TLDs or anything, it'll have to be written differently.

Chad Birch
A: 

I'm not clear how you want to make the match work. but with the usual extended regex, you should be able to match any tld with [a-zA-Z]{2,3} So if you're trying to get the whole name other than the tld, something like

\(.\)\.[a-zA-Z]{2,3}$

should be close.

Charlie Martin
I think not. First, the backslashes are all wrong. Second, even if they were right it would reject tlds like .name and .info.
Boo
A: 
(?<Domain>.*)\.(?<TLD>.*?)$
Boo
A: 

You could just strip off the tld:

s/\.[^\.]*$//;
Plutor
Although this does strip off the last bit from the host name, that's not what a TLD is.
tadman
+3  A: 

Assuming your string is correctly formatted and doesn't include things like protocol [i.e. http://], you need all characters up to but not including the final .tld.

So this is the simplest way to do this. The trick with regular expressions is not to overcomplicate things:

.*(?=\.\w+)

This basically says, give me all characters in the set that is followed by [for example] .xxx, which will basically just return everything prior to the last period.

If you don't have lookahead, it would probably be easiest to use:

(\w+\.)+

which will give you everything up to and including the final '.' and then just trim the '.'.

BenAlabaster
Rats, it took me to long. You were first. Good regex ;) (in case you have positive lookahead)
Norbert Hartl
Watch out with TLDs like ".co.uk" or ".com.au" which are very common.
tadman
+1  A: 

Try this

/.+(?=\.\w+$)/

without support of the ?= it would be

/(.+)\.\w+$/

and then take the content of the first group

Norbert Hartl
would \w+\. not be easier and just trim the trailing '.'?
BenAlabaster
Hmmm, no. Sounds more complicated so it must be :) I think it doesn't really matter as long as it works. Probably s/\.w+$// would be very easy. Otherwise split(/\./) pop last join('.) is also possible. You know when it comes to regex it is only beauty that counts (in the eye of the regex lover)
Norbert Hartl