I have a column named "situation" and "entityid".
Entityid Situation 1234 In the the world of of 3456 Total universe is is a
Can any one please give me query to find these type of higlihted words.
Thanks Ramesh
I have a column named "situation" and "entityid".
Entityid Situation 1234 In the the world of of 3456 Total universe is is a
Can any one please give me query to find these type of higlihted words.
Thanks Ramesh
If you are willing to use SQL Server Express, you will be able to create a CLR User Defined Function.
http://msdn.microsoft.com/en-us/library/w2kae45k(VS.80).aspx
You will then have the power of Regular Expressions at your finger tips.
Then, depending on your proficiency with RegEx, you're either left with zero problems or two problems.
If you want to hard code it:
select EntityID, Situation
from Entity
where Situation like '%the the%'
or Situation like '%of of%'
or Situation like '%is is%'
Update: Here is a slightly less hard-coded approach:
select EntityID, Situation, right(s2, diff * 2 + 1) as RepeatedWords
from (
select EntityID, Situation, WordNumber,
substring_index(Situation, ' ', WordNumber) s1,
substring_index(Situation, ' ', WordNumber + 1) s2,
length(substring_index(Situation, ' ', WordNumber + 1)) - length(substring_index(Situation, ' ', WordNumber)) -1 diff
from `Entity` e
inner join (
select 1 as WordNumber
union all
select 2
union all
select 3
union all
select 4
union all
select 5
union all
select 6
union all
select 7
union all
select 8
union all
select 9
union all
select 10
) n
) a
where right(s1, diff) = right(s2, diff)
and diff > 0
order by EntityID, WordNumber
It will search up to the first 10 words or so, and doesn't handle case, punctuation or multiple spaces properly, but it should give you an idea of an approach you can take. If you want it to handle longer strings, just keep adding to the UNION ALL statements.