I want to return the 10 most common words from a query in SQL server, so running against a set of rows such as this:
quick brown fox
slow yellow fox
slow green fox
would return
fox
slow
quick
brown
yellow
green
I want to return the 10 most common words from a query in SQL server, so running against a set of rows such as this:
quick brown fox
slow yellow fox
slow green fox
would return
fox
slow
quick
brown
yellow
green
I'd try running a split function (separating each word through a space) over each returned row in order to get all separate words into an auxiliar table. Using the following code, you should be able to split a row by its spaces:
CREATE FUNCTION dbo.Split(@String varchar(8000), @Delimiter char(1))
returns @temptable TABLE (items varchar(8000))
as
begin
declare @idx int
declare @slice varchar(8000)
select @idx = 1
if len(@String)<1 or @String is null return
while @idx!= 0
begin
set @idx = charindex(@Delimiter,@String)
if @idx!=0
set @slice = left(@String,@idx - 1)
else
set @slice = @String
if(len(@slice)>0)
insert into @temptable(Items) values(@slice)
set @String = right(@String,len(@String) - @idx)
if len(@String) = 0 break
end
return
end
You should call this function from within a cursor or something; inside it, simply use something like:
insert into #tmp (word) select * from dbo.split(' ', @row)
Finally, you would only have to use a simple query like:
select top 10 count(*) as number, word from separated_words_table order by number
To see how to do this declaratively (i.e. without a while
loop), see an answer I worked on (for a code golf of all things): http://stackoverflow.com/questions/3169051/code-golf-word-frequency-chart/3173246#3173246
Note that the code in that link is intended to have the fewest characters possible, not to be readable. Please use more descriptive names, at least.
Another way. ("Borrowed" from here)
WITH Sentences AS
(
SELECT 'quick brown fox' AS Sentence UNION ALL
SELECT 'slow yellow fox' UNION ALL
SELECT 'slow green fox'
),
Xmlified AS
(
SELECT
CAST('<M>' + REPLACE(Sentence,' ','</M><M>') + '</M>' AS XML) AS xSentence
FROM Sentences
),
Words AS
(
SELECT
Split.a.value('.', 'VARCHAR(100)') AS word
FROM Xmlified
CROSS APPLY xSentence.nodes('/M') Split(a)
)
SELECT COUNT(*) AS C, word FROM Words
GROUP BY word
ORDER BY C DESC