views:

47

answers:

3

I have a table of common words that are used in sentences (i.e. A, the, and, where, etc...)

What I want to do is loop through all those words and strip them out of the descriptions that people have entered to attempt to generate common keywords or tags. But I can't use replace because replace will remove any instance of the common word regardless of whether it is only a couple of letters that make up a larger word. For instance:

I want to replace A in the description. Now obviously a lot of words contain the letter a. So all those A's will be stripped from the words. I don't want that. I only want it when A is used a a whole word. I can figure this out using regular expressions but was wondering if there was anyway to do this in SQL without having to resort to CLR proc.

Maybe I am missing something but I couldn't seem to find an easy way to do this without having to write some specific scenarios like: word plus space before, word plus space after, word plus period after, etc... I don't think that is the best way.

A: 

For quick and dirty, I used to slosh through the various SQL functions PATINDEX, LEFT, RIGHT and LIKE to do this sort of thing. For one-time data prep, I export to something like Excel and eyeball it.

A good approach also is to create a new StringSubstitutionTable with two columns SOURCESTRING and TARGETSTRING and run a replace function to replace the SOURCESTRING with the TARGETSTRING on the joined table. This is cool because you can just add substitution entries as needed.

Ash Machine
@Ash seems like that could be useful and thanks for the tip. But right now I am just looking to remove all these common words from the description to generate a new one. I'm not so much looking to replace one word for another. But I will keep this approach in mind if this need ever comes up.
spinon
A: 

You can try nesting the replaces for each word you would like to replace. For example:

UPDATE TableName
SET ColumnName = REPLACE(REPLACE(REPLACE(REPLACE(TableName.ColumnName,' a ',' '),' the ',' '),' and ',' '), '  ', ' ')

Let me know if this is what you were looking for.

Jagermeister
A: 

Here is they way I did something similar to what you are trying to do.

During your replace action...
Append a space before and after the common word.
Append a space before and after the description.

Let's suppose you want to remove the CommonWord "A" from the Description.

Description: "A good phrase never starts with A or ends with A"
CommonWord: "A"

Update TableName
Set Description =
LTRIM(RTRIM(Replace(' ' + Description + ' ', ' ' + CommonWord + ' ', ' ')))

This lets you delete all whole words that equal 'A'. Because you are replacing ' A ' with a space you need to LTRIM RTRIM to remove any leading or trailing spaces.

You could also do this in two steps:

--
-- Step 1 Loop through all common words removing them
--
Update TableName
Set Description = Replace(' ' + Description + ' ', ' ' + CommonWord + ' ', ' ')

--
-- Step 2 Unconditionally Trim all Descriptions
--
Update TableName
Set Description = LTRIM(RTIM(Description))

Cape Cod Gunny