views:

38

answers:

3

I want to return the 10 most common words from a query in SQL server, so running against a set of rows such as this:

quick brown fox
slow yellow fox
slow green fox

would return
fox
slow
quick
brown
yellow
green

+1  A: 

I'd try running a split function (separating each word through a space) over each returned row in order to get all separate words into an auxiliar table. Using the following code, you should be able to split a row by its spaces:

CREATE FUNCTION dbo.Split(@String varchar(8000), @Delimiter char(1))     
returns @temptable TABLE (items varchar(8000))     
as     
begin     
    declare @idx int     
    declare @slice varchar(8000)     

    select @idx = 1     
        if len(@String)<1 or @String is null  return     

    while @idx!= 0     
    begin     
        set @idx = charindex(@Delimiter,@String)     
        if @idx!=0     
            set @slice = left(@String,@idx - 1)     
        else     
            set @slice = @String     

        if(len(@slice)>0)
            insert into @temptable(Items) values(@slice)     

        set @String = right(@String,len(@String) - @idx)     
        if len(@String) = 0 break     
    end 
return     
end

You should call this function from within a cursor or something; inside it, simply use something like:

insert into #tmp (word) select * from dbo.split(' ', @row)

Finally, you would only have to use a simple query like:

select top 10 count(*) as number, word from separated_words_table order by number

Source here

Hal
+2  A: 

To see how to do this declaratively (i.e. without a while loop), see an answer I worked on (for a code golf of all things): http://stackoverflow.com/questions/3169051/code-golf-word-frequency-chart/3173246#3173246

Note that the code in that link is intended to have the fewest characters possible, not to be readable. Please use more descriptive names, at least.

Gabe
great thread btw
Hal
A: 

Another way. ("Borrowed" from here)

WITH Sentences AS
(
SELECT 'quick brown fox' AS Sentence UNION ALL
SELECT 'slow yellow fox' UNION ALL
SELECT 'slow green fox'
),
Xmlified AS
(
    SELECT
       CAST('<M>' + REPLACE(Sentence,' ','</M><M>') + '</M>' AS XML)  AS xSentence
FROM Sentences
),
Words AS
(
SELECT
 Split.a.value('.', 'VARCHAR(100)') AS word
FROM Xmlified
CROSS APPLY xSentence.nodes('/M') Split(a)
)
SELECT COUNT(*) AS C, word FROM Words
GROUP BY word
ORDER BY C DESC
Martin Smith