tags:

views:

241

answers:

8

I have a SQL table with 11000 keywords in it.

I want a query that can find fields which contain a certian letter.

So if I include "a" and "b" the query will select all fields which contain the letter "a" and the letter "b" somewhere in the field.

A: 
where somefield like '%a%' or somefield like '%b%'
longneck
+3  A: 
select *
from table
where keyword like '%a%'
and keyword like '%b%'

ps This will be super slow. You may want to investigate full text indexing solutions.

Tom Ritter
Full text indexing generally won't help for searching for words shorter than 3-4 characters.
Daren Schwenke
As this will be slow anyways, you may throw in a `lower(keyword)` if there is no requirement to differentiate between upper and lowercase.
IronGoofy
A: 
select * from your_table where your_field like '%a%b%'

and be prepared to wait a while...

Edit: note that this pattern looks for an 'a' followed by a 'b' (possibly with other "stuff" in between) -- rereading your question, that may not be what you wanted...

Jerry Coffin
This actually won't work if the content of the field is something like 'bddddatttt'
Maximilian Mayerl
Yes -- I originally mis-read his question as saying "...and then letter 'b'" instead of "...and the letter 'b'"
Jerry Coffin
+2  A: 

try this

Select * From Table
Where field like '%' + ltrValue1 + '%'
  And field like '%' + ltrValue2 + '%'
... etc.

and be prepared for a table scan as this functionality cannot use any existing indices

Charles Bretana
+1  A: 

All the answers given using LIKEare totally valid, but as all of them noted will be slow. So if you have a lot of queries and not too many changes in the list of keywords, it pays to build a structure that allows for faster querying.

Here are some ideas:

If all you are looking for is the letters a-z and you don't care about uppercase/lowercase, you can add columns containsA .. containsZ and prefill those columns:

UPDATE table
SET containsA = 'X' 
WHERE UPEPR(your_field) Like '%A%';

(and so on for all the columns).

Then index the contains.. columns and your query would be

SELECT 
FROM your_table
WHERE containsA = 'X'
AND containsB = 'X'

This may be normalized in an "index table" iTable with the columns your_table_key, letter, index the letter-column and your query becomes something like

SELECT
FROM your_table 
WHERE <key> in (select a.key
    From iTable a join iTable b and a.key = b.key
    Where a.letter = 'a'
    AND b.letter = 'b');

All of these require some preprocessing (maybe in a trigger or so), but the queries should be a lot faster.

IronGoofy
Yep, if this is to be a common query, prefilling may be the best bet. The initial fill will take awhile but the inserts and updates that happen after that should go fairly quickly in a properly written trigger as you are only affecting a very small number of rows generally.
HLGEM
A: 

You can use a cursor and temp table approach so you aren't doing full table scan each time. What this would be doing is populating the temp table with all of your keywords, and then with each string in the @letters XML, it would remove any records from the temp table. At the end, you only have records in your temp table that have each of your desired strings in it.

declare @letters xml

SET @letters = '<letters> <letter>a</letter> <letter>b</letter> </letters>'

-- SELECTING LETTERS FROM THE XML SELECT Letters.l.value('.', 'nvarchar(50)') AS letter FROM @letters.nodes('/letters/letter') AS Letters(l)

-- CREATE A TEMP TABLE WE CAN DELETE FROM IF A RECORD DOESN'T HAVE THE LETTER CREATE TABLE #TempResults (keywordID int not null, keyWord nvarchar(50) not null) INSERT INTO #TempResults (keywordID, keyWord) SELECT employeeID, firstName FROM Employee

-- CREATE A CURSOR, SO WE CAN LOOP THROUGH OUR LETTERS AND REMOVE KEYWORDS THAT DON'T MATCH DECLARE Cursor_Letters CURSOR READ_ONLY FOR SELECT Letters.l.value('.', 'nvarchar(50)') AS letter FROM @letters.nodes('/letters/letter') AS Letters(l)

DECLARE @letter varchar(50) OPEN Cursor_Letters

FETCH NEXT FROM Cursor_Letters INTO @letter WHILE (@@fetch_status <> -1) BEGIN IF (@@fetch_status <> -2) BEGIN DELETE FROM #TempResults WHERE keywordID NOT IN (SELECT keywordID FROM #TempResults WHERE keyWord LIKE '%' + @letter + '%') END FETCH NEXT FROM Cursor_Letters INTO @letter END

CLOSE Cursor_Letters DEALLOCATE Cursor_Letters

SELECT * FROM #TempResults

DROP Table #TempResults GO

Greg Andora
A: 

I'd use wildcard searching.

where <field> like '%[ab]%'

It isn't regex, but it does a good job. You can also do variants like <field> like 'sim[oa]ns' -- which will match simons, and simans...

Depnding on your collation you may or may not have to include case data, like '%[aAbB]%'

As mentioned elsewhere be prepared for a wait since indexes are out of the question when you're doing contains searching.

Kris

KSimons
A: 

I'll assume you meant more or less what you said, and you want to find keywords in your table that "contain the letter 'a' and the letter 'b'." Some of the solutions here give the answer to a different question.

To get keywords that contain both the letters 'a' and 'b' in them (as opposed to those that contain either letter), you can use 'ab' as the in the query below:

select
  keyword
from myTable
where not exists (
  select Nums26.i from Nums26
  where Nums26.i <= len(<matchsetstring>) -- or your dialect's equivalent for LEN()
  and keyword not like '%'+substring(<matchsetstring>,Nums26.i,1)+'%' -- adapt SUBSTRING to your dialect
);

The table named "Nums26" should contain a column "i" (indexed for efficiency) that contains each of the values 1 through 26 (or more if you might try to match more than letters). See below. Advice given by others applies with regard to upper/lower case. If your collation is case-sensitive, however, you can't simply specify 'aAbB' here as your , because that would request keywords that contain each of the four characters a, A, b, and B. You might use UPPER and match 'AB', perhaps.

create table nums26 (
  i int primary key
);
insert into nums26 values (1);
insert into nums26 select 1+i from nums26;
insert into nums26 select 2+i from nums26;
insert into nums26 select 4+i from nums26;
insert into nums26 select 8+i from nums26;
insert into nums26 select 16+i from nums26;
Steve Kass