tags:

views:

34

answers:

4

hello, I have a large database that contains many urls, there are many domains repeating and i;m trying to get only the domain. eg:

http://example.com/someurl.html
http://example.com/someurl_on_the_same_domain.html
http://example.net/myurl.php
http://example.org/anotherurl.php

and i want to get only domains, eg:

http://example.com
http://example.net
http://example.org

My query is:

SELECT id, site FROM table GROUP BY site ORDER BY id DESC LIMIT 50

I need to use regex i think but i'm not mysql guru.

A: 

You can select the domains with:

select left(site,6+locate('/',substring(site,8)))
Ned Batchelder
thx that also worked but it's not getting domains only (eg http://example.com)
JQman
If you need it to work on 'example.com', you should put an example like that in the question. Looks like you've already solved the problem though.
Ned Batchelder
+1  A: 
SELECT
    SUBSTR(site, 1 , LOCATE('/', site, 8)-1)
        as OnlyDomain
    FROM table
    GROUP BY OnlyDomain
    ORDER BY id DESC LIMIT 50

[EDIT] : After OP request, here's the updated answer that will show correct results even if domain names does not have trailing slashes:

SELECT
    SUBSTR(site, 1 , IF(LOCATE('/', site, 8), LOCATE('/', site, 8)-1, LENGTH(site)))
        as OnlyDomain
    FROM tablename
    GROUP BY OnlyDomain
    ORDER BY id DESC LIMIT 50
shamittomar
thx, it seems to be working but when a url only contains domain name (http://google.com and not http://google.com/url.html) it doesn't pulling it
JQman
@JQman, I have updated the answer.
shamittomar
this is great thanks a lot!
JQman
A: 

You can use string replacement. Assuming that the "site" column contains the url:

select id, 
substr(site,1,locate('/', site ,locate('//',site)+2)-1)
from table 
group by site
order by id
desc limit 50;

Be careful to make sure that multiple slashes are encoded, eg:

http://example.com/somethingelse/someurl.html
zevra0
A: 
SELECT id,
       SUBSTRING_INDEX(REPLACE(REPLACE(site,'http://',''),'https://',''),'/',1) as domain 
       FROM table
       GROUP BY domain 
       ORDER BY id DESC 
       LIMIT 50

That was working for me, if anybody needs it.

JQman