tags:

views:

125

answers:

4

say i have a nvarchar field in my database that looks like this

1, "abc abccc dabc"
2, "abccc dabc"
3, "abccc abc dabc"

i need a select LINQ query that would match the word "abc" with boundaries not part of a string

in this case only row 1 and 3 would match

A: 

Maybe a regular expression like this (nb - not compiled or tested):

var matches = from a in yourCollection
         where Regex.Match(a.field, ".*\sabc\s.*")
         select a;
marshall
That should work, although it's going to cause a significant performance hit on a large table. You'd definitely want to compile the regex before you run the query.
Noldorin
how do i compile it?
would regex work with sql server?
@decon: The regex should be able to run on the client side (someone correct me if I'm wrong), so there shouldn't be any problem. I recommend you go with my simpler solution which uses string.Split still...
Noldorin
+2  A: 
from row in table.AsEnumerable()
where row.Foo.Split(new char[] {' ', '\t'}, StringSplitOptions.None)
    .Contains("abc")
select row

It's important to include the call to AsEnumerable, which means the query is executed on the client-side, else (I'm pretty sure) the Where clause won't get converted into SQL succesfully.

Noldorin
my badit's not comma seperated the 1, is the primary keythe string is"abc abccc dabc"so a space?
@Noldorin, I edited your answer to replace the split chars. Split is better in this case. +1
bruno conde
@bruno: Ah, thanks. Not sure why I wrote a comma, but yeah it should work well now.
Noldorin
Downvote because?
Noldorin
This will be HORRIBLY inefficient if the table contains lots of rows - because it will fetch every row from SQL Server to the client.
Joe Albahari
That's not a reason, really. The question didn't request an efficient solution and nor did my answer specify that it was the *most* efficient solution. Also, some points to note: a) your regex solution is horribly efficient compared to string.Split, especially when you don't compile it, b) adding the WHERE clause to the SQL would actually be detrimental on performance up to a certain table size! Nonetheless, you are right in suggesting that adding the string.Contains check would improve efficiency for very large databases.
Noldorin
A: 
datacontext.Table.Where(
         e => Regex.Match(e.field, @"(.*?[\s\t]|^)abc([\s\t].*?|$)")
);

or

datacontext.Table.Where(
         e => e.Split(' ', '\t').Contains("abc");
);
bruno conde
sql server support regex? or does linq download the whole table from sql server and then process the regex
@decon: I've just updated my post which demonstrates how to insure that the query gets interpreted on the client-side rather than converted into SQL and run on the server side.
Noldorin
A: 

For efficiency, you want to do as much of the filtering as possible on the server, and then the rest of the filtering on the client. You can't use Regex on the server (SQL Server doesn't support it) so the solution is to first use a LIKE-type search (by calling .Contains) then use Regex on the client to further refine the results:

db.MyTable
  .Where (t => t.MyField.Contains ("abc"))
  .AsEnumerable()    // Executes locally from this point on
  .Where (t => Regex.IsMatch (t.MyField, @"\babc\b"))

This ensures that you retrieve only the rows from SQL Server than contain the letters 'abc' (regardless of whether they're a word-boundary match or not) and use Regex on the client-side to further restrict the result set so that only matches that are on word boundaries are included.

Joe Albahari
+1 for only retrieving rows containing "abc".
Daniel Brückner