tags:

views:

160

answers:

2

I have a column named "situation" and "entityid".

Entityid    Situation
1234        In the the world of of
3456        Total universe is is a

Can any one please give me query to find these type of higlihted words.

Thanks Ramesh

A: 

If you are willing to use SQL Server Express, you will be able to create a CLR User Defined Function.

http://msdn.microsoft.com/en-us/library/w2kae45k(VS.80).aspx

You will then have the power of Regular Expressions at your finger tips.

Then, depending on your proficiency with RegEx, you're either left with zero problems or two problems.

Tormod
MySQL has a REGEXP function.
RedFilter
Oops. I didn't know that.
Tormod
+1  A: 

If you want to hard code it:

select EntityID, Situation
from Entity
where Situation like '%the the%'
or Situation like '%of of%'
or Situation like '%is is%'

Update: Here is a slightly less hard-coded approach:

select EntityID, Situation, right(s2, diff * 2 + 1) as RepeatedWords
from (
    select EntityID, Situation, WordNumber,
     substring_index(Situation, ' ', WordNumber) s1,
     substring_index(Situation, ' ', WordNumber + 1) s2,
     length(substring_index(Situation, ' ', WordNumber + 1)) - length(substring_index(Situation, ' ', WordNumber)) -1 diff
    from `Entity` e
    inner join (
     select 1 as WordNumber
     union all
     select 2 
     union all
     select 3 
     union all
     select 4 
     union all
     select 5 
     union all
     select 6 
     union all
     select 7 
     union all
     select 8 
     union all
     select 9 
     union all
     select 10 
    ) n
) a
where right(s1, diff) = right(s2, diff)
    and diff > 0
order by EntityID, WordNumber

It will search up to the first 10 words or so, and doesn't handle case, punctuation or multiple spaces properly, but it should give you an idea of an approach you can take. If you want it to handle longer strings, just keep adding to the UNION ALL statements.

RedFilter
This really doesn't seem in the spirit of what the op is asking. I think they're asking for in the generic case where any possible word could be repeated which I'd guess regex would be the best choice.
faceless1_14
Agreed, this is a last-ditch approach.
RedFilter