tags:

views:

848

answers:

4

If I have a table field named 'description', what would be the SQL (using MS SQL) to get a list of records of all distinct words used in this field.

For example:

If the table contains the following for the 'description' field:

Record1 "The dog jumped over the fence."
Record2 "The giant tripped on the fence."
...

The SQL record output would be:

"The","giant","dog","jumped","tripped","on","over","fence"
A: 

it'd be a messy stored procedure with a temp table and a SELECT DISTINCT at the end.

if you had the words already as records, you would use SELECT DISTINCT [WordsField] from [owner].[tablename]

Jeremy
+3  A: 

I do not think you can do this with a SELECT. The best chance is to write a user defined function that returns a table with all the words and then do SELECT DISTINCT on it.


Disclaimer: Function dbo.Split is from http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648

CREATE TABLE test
(
    id int identity(1, 1) not null,
    description varchar(50) not null
)

INSERT INTO test VALUES('The dog jumped over the fence')
INSERT INTO test VALUES('The giant tripped on the fence')

CREATE FUNCTION dbo.Split
(
    @RowData nvarchar(2000),
    @SplitOn nvarchar(5)
)  
RETURNS @RtnValue table 
(
    Id int identity(1,1),
    Data nvarchar(100)
) 
AS  
BEGIN 
    Declare @Cnt int
    Set @Cnt = 1

    While (Charindex(@SplitOn,@RowData)>0)
    Begin
     Insert Into @RtnValue (data)
     Select 
      Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))

     Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
     Set @Cnt = @Cnt + 1
    End

    Insert Into @RtnValue (data)
    Select Data = ltrim(rtrim(@RowData))

    Return
END

CREATE FUNCTION dbo.SplitAll(@SplitOn nvarchar(5))
RETURNS @RtnValue table
(
    Id int identity(1,1),
    Data nvarchar(100)
)
AS
BEGIN
DECLARE My_Cursor CURSOR FOR SELECT Description FROM dbo.test
DECLARE @description varchar(50)

OPEN My_Cursor
FETCH NEXT FROM My_Cursor INTO @description
WHILE @@FETCH_STATUS = 0
BEGIN
    INSERT INTO @RtnValue
    SELECT Data FROM dbo.Split(@description, @SplitOn)
   FETCH NEXT FROM My_Cursor INTO @description
END
CLOSE My_Cursor
DEALLOCATE My_Cursor

RETURN

END

SELECT DISTINCT Data FROM dbo.SplitAll(N' ')
smink
+1  A: 

In SQL on it's own it would probably need to be a big stored procedure, but if you read all the records out to the scripting language of your choice, you can easily loop over them and split each out into arrays/hashes.

Pavling
A: 

I just had a similar problem and tried using SQL CLR to solve it. Might be handy to someone

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

using System.Collections;
using System.Collections.Generic;

public partial class UserDefinedFunctions
{
    private class SplitStrings : IEnumerable
    {
     private List<string> splits;

     public SplitStrings(string toSplit, string splitOn)
     {
      splits = new List<string>();

      // nothing, return empty list
      if (string.IsNullOrEmpty(toSplit))
      {
       return;
      }

      // return one word
      if (string.IsNullOrEmpty(splitOn))
      {
       splits.Add(toSplit);

       return;
      }

      splits.AddRange(
       toSplit.Split(new string[] { splitOn }, StringSplitOptions.RemoveEmptyEntries)
      );
     }

     #region IEnumerable Members

     public IEnumerator GetEnumerator()
     {
      return splits.GetEnumerator();
     }

     #endregion
    }

    [Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName = "readRow", TableDefinition = "word nvarchar(255)")]
    public static IEnumerable fnc_clr_split_string(string toSplit, string splitOn)
    {
     return new SplitStrings(toSplit, splitOn);
    }

    public static void readRow(object inWord, out SqlString word)
    {
     string w = (string)inWord;

     if (string.IsNullOrEmpty(w))
     {
      word = string.Empty;
      return;
     }

     if (w.Length > 255)
     {
      w = w.Substring(0, 254);
     }

     word = w;
    }
};
marshall