views:

40

answers:

2

I'm trying to use this script (which extracts the domain from a URL):

CREATE FUNCTION [dbo].[parseURL]  (@strURL varchar(1000))
RETURNS varchar(1000)
AS
BEGIN
IF CHARINDEX('http://',@strURL) > 0 OR CHARINDEX('https://',@strURL) > 0
SELECT @strURL = REPLACE(@strURL,'https://','')
SELECT @strURL = REPLACE(@strURL,'http://','')
SELECT @strURL = REPLACE(@strURL,'www','')
-- Remove everything after "/" if one exists
IF CHARINDEX('/',@strURL) > 0 (SELECT @strURL = LEFT(@strURL,CHARINDEX('/',@strURL)-1))    
RETURN @strURL
END

Using this SQL command:

SELECT COUNT(*) as theCount, dbo.parseURL(url) as url FROM hit
WHERE url IS NOT NULL
GROUP BY dbo.parsedomain(url)
ORDER BY thecount DESC

But I'm receiving this error:

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[dbo].[parseURL] (@strURL varchar(1000)) RETURNS varchar(1000) AS BEGIN IF ' at line 1

I am relatively new to both SQL and MySQL, so not sure if the error is SQL-general or MySQL-specific. What am I missing?

+3  A: 

Here's the equivalent MySQL function:

DELIMITER $$

CREATE FUNCTION parseURL(strURL VARCHAR(200)) 
  RETURNS varchar(200) 
BEGIN

  IF INSTR(strURL, 'http://') > 0 OR INSTR(strURL, 'https://') > 0 THEN
    SET strURL = REPLACE(strURL,'https://','');
    SET strURL = REPLACE(strURL,'http://','');
    SET strURL = REPLACE(strURL,'www','');
    -- Remove everything after "/" if one exists
    IF INSTR(strURL, '/') > 0 THEN 
     SET strURL = LEFT(strURL, INSTR(strURL, '/')-1);
   END IF;
  END IF;

  RETURN strURL;
END

Use this query:

  SELECT COUNT(*) as theCount, 
         parseURL(url) as url 
    FROM hit
   WHERE url IS NOT NULL
GROUP BY parseURL(url)
ORDER BY thecount DESC
OMG Ponies
Many thanks, OMG Ponies--this worked great!
Dan
+3  A: 

Jesus Christ Ponies, effort wasted, but here it is:

DELIMITER |
CREATE FUNCTION DOMAIN (url VARCHAR(255)) 
 RETURNS VARCHAR(255) DETERMINISTIC
  BEGIN
   DECLARE pos INT;
    SET pos := LOCATE('://', url);
    SET url := CASE WHEN pos=0 THEN url ELSE SUBSTR(url, pos+3) END;
    SET url := REPLACE(url,'www.',''); 
    RETURN SUBSTRING_INDEX(url, '/', 1);
   END

Then query:

  SELECT COUNT(*) as theCount,    
         DOMAIN(url) as url    
    FROM hit   
   WHERE url IS NOT NULL   
GROUP BY url   
ORDER BY thecount DESC  

My function has the advantage of stripping any protocol, https, http, ftp, whatever

quantumSoup
OMG Ponies
@OMG Probably. I pulled a bunch of random URLs from some urllist.txt files (about 25000 of them) and ran the functions. Mine got an average of around 0.52s, yours around 0.73s.
quantumSoup