tags:

views:

599

answers:

3

I have a MySQL table with a column of well-formed URLs. I'd like to get a table of the unique domain names in the column, and the number of times each domain appears. Is there a MySQL query that can do this?

I thought of doing something like...

SELECT COUNT(*)
FROM log
GROUP BY url REGEXP "/* regexp here */"

...but this doesn't work as REGEXP returns 0 or 1 and not what was matched.

+5  A: 

To return count of a certain domain:

SELECT  SUM(url REGEXP '^http://example.com')
FROM    log

Unfortunalely, MySQL doesn't return the matched string on regexps.

If your log records always look like http://example.com/*, you can issue:

SELECT  SUBSTRING_INDEX(url, '/', 3), COUNT(*)
FROM    log
GROUP BY
        1
Quassnoi
Thanks, that solved it for me.
isani
+1  A: 

Well if they're full formed urls, you could first replace("http://", "") and replace everything after the first occurence of a /

so

http://www.domain.com/page.aspx

would become

www.domain.com

I'm not sure of the MySQL syntax for Replace but in MSSQL it would be.

DECLARE @url nvarchar(50)
SET @url = 'http://www.domain.com/page.aspx'
SELECT LEFT(REPLACE(@url, 'http://', ''), CharIndex('/', REPLACE(@url, 'http://', '')) - 1)

From this you could get a subtable of all domain names and then count/group them.

SELECT
    Count(*),
    DomainOnly
FROM
(
    SELECT 
        LEFT(REPLACE(urlColumn, 'http://', ''), CharIndex('/', REPLACE(urlColumn, 'http://', '')) - 1) as DomainOnly
    FROM 
        TABLE_OF_URLS
) as Domains
GROUP BY 
    DomainOnly
Eoin Campbell
A: 

If by domain you mean TLD (thus you'd want to count www.example.com, corp.example.com, www.local.example.com, and example.com as one domain), then the regexp would be:

 '[[:alnum:]-]+\.[[:alnum:]-]+/'

I'm assuming that this are well formed URL's, in form of schema://host/[path]

vartec