views:

45

answers:

5

Hi guys

I have a table with many URLS like

www.topshop.com
www.shopbop.co.uk
http://www.magickingdom.net
http://www.asos.co.uk

UPDATE

Needs to be able to handle this URL as well

topshop.co.uk

I want to be able to strip out the fluff such that it returns only

topshop
shopbop
magickingdom
asos

Do I have to write a manual exception for each instance? Is there a clever way of formatting the column to return everything between the first two periods of the URL?

+1  A: 

I think it's doable, but involved. You may want to read the URLS out into a program, parse it with any one of a dozen URL parsing routines, then update your database.

Ian Jacobs
+1  A: 

EDIT This has been modified per your comment.

This covers cases with no subdomain, as well as cases without periods (which I imagine are just invalid). It also covers cases where a domain begins with "www." and include a compound TLD (such as ".co.uk"). Beyond that, you'll have to specify which cases you're interested in.

WITH urls (domain) AS (
    SELECT CASE WHEN url LIKE '%.%.%'
                THEN SUBSTRING(url, CHARINDEX('.', url) + 1, 999)
                WHEN url LIKE '%.%'
                THEN LEFT(url, CHARINDEX('.', url) - 1)
                ELSE url
            END
     FROM (SELECT CASE WHEN url LIKE '%www.%'
                       THEN url
                       WHEN url NOT LIKE '%.%'
                       THEN url
                       ELSE 'www.' + url
                   END url
             FROM (SELECT 'www.topshop.com' url
             UNION SELECT 'www.shopbop.co.uk'
             UNION SELECT 'topshop.co.uk'
             UNION SELECT 'nosubdomain.net'
             UNION SELECT 'nothingwhatsoever'
             UNION SELECT 'http://www.magickingdom.net'
             UNION SELECT 'http://www.asos.co.uk') a
     ) b
)
SELECT CASE WHEN domain LIKE '%.%'
                THEN LEFT(domain, CHARINDEX('.', domain) - 1)
                ELSE domain
            END
  FROM urls

This is the output.

asos
magickingdom
nosubdomain
nothingwhatsoever
topshop
shopbop
topshop
harpo
thanks! i could kiss you right now!
Nai
Hey harpo, found a case where it doesnt work. If the URL is "topshop.co.uk"
Nai
@Nai, if that's the case, you'll need to identify specific strings (such as "www") that will be interpreted as subdomains instead of primary domains. Or, you could identify suffixes (such as "co.uk") where they are interpreted together as a TLD.
harpo
+1  A: 

Here's a conventional string-manipulation way of doing it. I assume that the string will always contain two decimal points. (You only need the last line, I've "drawn out" how I constructed it.)

DECLARE @Foo varchar(100)

SET @Foo = 'www.topshop.com '
SET @Foo = 'www.shopbop.co.uk '
SET @Foo = 'http://www.magickingdom.net '
SET @Foo = 'http://www.asos.co.uk '


PRINT @Foo

--  Start of string to extract
PRINT charindex('.', @Foo) + 1

--  Extracted string, part 1
PRINT substring(@foo, charindex('.', @Foo) + 1, 100)

--  In extracted string, where do we want to stop
PRINT charindex('.', substring(@foo, charindex('.', @Foo) + 1, 100)) - 1

--  Extracted string
PRINT left(substring(@foo, charindex('.', @Foo) + 1, 100), charindex('.', substring(@foo, charindex('.', @Foo) + 1, 100)) - 1)

This clearly demonstrates that SQL string manipulations can get very ugly (if not downright stupid), and that @Ian Jacobs is right, you should use a language more suitable to the task.

Philip Kelley
If you're parsing a whole lot of values all at once (hundreds or thousands in a table?) I'd recommend working up a solution using Tally tables.
Philip Kelley
I've chosen harpo's answer because his handles more exceptions
Nai
A: 

Try something like this:

declare @urls table (url varchar(100))

insert into @urls
select 'www.topshop.com'
union
select 'www.shopbop.co.uk'
union
select 'http://www.magickingdom.net'
union
select 'http://www.asos.co.uk'

select 
    left(right(url,len(url)-CHARINDEX('.',url)),CHARINDEX('.',right(url,len(url)-CHARINDEX('.',url)))-1)
from @urls
Jose Chama
A: 

Step 1. Write a method to do this in CLR.
Step 2. Install it in your sql server.
Step 3. ???
Step 4. Profit!

Chris Lively