views:

422

answers:

4

Do you have any suggestions for my code below that you might have to improve the performance? This is .NET 2.0 framework, and the DataTable gets bound to a DataGridview. The data is loaded into the Datatable via .ReadXML() it doesn't come from a database. There can be any where from 80 to 100k of records. The Terms() array is what the user passed it for the search. So "bob taco" would be terms(0) = "bob" , terms(1) = "taco". I have a regex that maintains any quoted terms. So "bob taco" would be on one element of the array. Is there a better way? I tried using Dataview since that has better performance but it didn't look like I could use the LIKE operator. Any suggestions are welcome, I'd really like to speed this up a bit.

Public Function Search(ByVal Terms() As String, ByRef ResidentTBL As DataTable) As DataTable
        'Dim t As Long = Now.Ticks

        Dim j As Integer
        Dim newdt As New DataTable("Users")
        Dim newtable As New DataTable
        newtable = ResidentTBL.Clone

        Dim termsceiling As Integer
        termsceiling = Terms.GetUpperBound(0)
        Dim filterstr As String = String.Empty
        Dim foundrows() As DataRow
        Dim sortOrder As String = "displayname ASC"
        Dim tempstr As String

        For j = 0 To termsceiling
            'remedy any invalid sql characters
            tempstr = Terms(j).Trim.ToUpper
            tempstr = tempstr.Replace("'", "''")
            tempstr = tempstr.Replace("*", "")
            tempstr = tempstr.Replace("%", "")

            'assemble the sql query

            filterstr = filterstr & _
        "((column1 LIKE '" & tempstr & "%') OR " & _
        "(column2 LIKE '" & tempstr & "%') OR " & _
        "(column3 LIKE '" & tempstr & "%') OR " & _
        "(column4 LIKE '" & tempstr & "%') OR " & _
        "(column5 LIKE '" & tempstr & "%') OR " & _
        "(column6 LIKE '" & tempstr & "%') OR " & _
        "(column7 LIKE '" & tempstr & "%') OR " & _
        "(column8 LIKE '" & tempstr & "%') OR " & _
        "(column9 LIKE '" & tempstr & "%') OR " & _
        "(column10 LIKE '" & tempstr & "%'))"


            'if there are further iterations append an AND  (user typed more than one term)
            If termsceiling > 0 And j <> termsceiling Then
                filterstr = filterstr & " AND "
            End If
        Next j

        filterstr = "(" & filterstr & ")"  'wrap the entire query

        foundrows = ResidentTBL.Select(filterstr, sortOrder)

        For i = 0 To foundrows.Length - 1
            newtable.ImportRow(foundrows(i))
        Next i

        newdt = newtable

        'Begin Debugging Code:
        't = Now.Ticks - t
        'MessageBox.Show("Took " & (t / 10000000) & " seconds.")
        'End Debugging Code:

        Return newdt
    End Function
A: 

I would do this differently on two accounts: firstly, I would run the query on the database and fill the datatable only with the correct (filtered) data.

Secondly, this way of using LIKE will go every time through all the records, making it slow if you have lots of users. What I end up doing in this situation is to implement a "poor-man search engine" which basically parses each text from those fields into words and inserts in another table each of the word with the corresponding user ID. That table can be indexed and the search will be directly with '=' and not 'LIKE' making it much faster.

Edit: Not having a database complicates things. Especially since you have a lot of data and I don't know if there is any indexing or optimization when searching like this. If you have a method of caching the datatable between requests you could build another datatable with parsed data. As long as the user is searching for the same kind of tokens it should work, but in order to find "bob taco" as two words one next to the other you need to save the position of the words when you parse the data and search accordingly (it makes it a little more complicated).

For example:

ID, Text
1, Hangs out at Bob Taco joint
2, Hates Bob and his taco

Would give something like this:

ID, Key, Pos
1, Hangs, 1
1, out, 2
1, at, 3
1, Bob, 4
1, Taco, 5
1, joint, 6
2, Hates, 1
2, Bob, 2
2, and, 3
2, his, 4
2, taco, 5

So now you need to search for the IDs that contain both bob and taco and the difference between they Pos values should be 1. For example ID 2 shouldn't be found since the Pos are 2 and 5.

I did this using temporary tables in SQL. If you need to work only in memory it might get harder.

rslite
Thank you for the response. Unfortunately there is no database, this is sucked in from an XML file. I use .readxml to suck it into the datatable. I then filter it accordingly. The reason I'm using LIKE is I have no idea what the user is going to pass to me. It is a free-floating text search. So they could put in a phone number, or a userid, or a firstname. I have no idea what they are going to put in. Would your "poorman search" work still?
Cj Anderson
+3  A: 

I have checked few issues and decided to rewrite my answer to be more accurate. Look at string handling. Every time you assign a string new value, a brand new string is created. Note: you are doing many big string operations in a loop. First, you take value and do some replacing. Then you make a big concatenation (“like” section). Try to use String.Format or – much, much better – use StringBuilder class. String operations can extremely degrade performance.

smok1
Great suggestion. +1
Cj Anderson
I know how to do the replaces with the string builder but what is the best way to do al the concats on the filter?
Cj Anderson
Use Append method: http://msdn.microsoft.com/en-us/library/b4sc8ca8(VS.80).aspx
smok1
+1  A: 

I would suggest not creating a filter, but instead doing something like

foreach (DataRow row in ResidentTBL.Rows)
    if (IsMatch(row))
          newtable.ImportRow(row);

where the IsMatch method implements the logic in your filter. This should give you a little more fine tuned control. DataTable fitlers are designed to be generic, so they use a mashup of keys style algorithm. That is not always the most optimal way to find a record.

Note, my example is C#, you'll need to adjust for VB, but I am not as verse in.

Kleinux
Just to clarify, you can use IndexOf on string to replace the LIKE operator.
Kleinux
Could you post an entire code snippet including my filter? I'll benchmark it, If it yields better performance I'll mark as answer. It can be in C#.
Cj Anderson
lol... I don't think he should have to do your work for you. He's right though, Filter wasn't really intended for what you're trying to do and Kleinux's method would be a bit faster.
blesh
Fair enough. I'll have to try it out.
Cj Anderson
A: 

Have you considered converting your search algorithms to use LINQ to XML? This will save you the heaviness of the DataTable.

You could simply load your XML file into an XmlDocument and perform LINQ queries on the elements in the in-memory XML object. You can find more questions and answers in the Linq-To-Xml tag page. Perhaps start with LINQ To XML: How Does It Work?

There's an extension method that will really help you: Contains()

If you want to continue with the DataTable implementation, here's more info on using LINQ with DataTables.

p.campbell
I'd love to, but it's .net 2.0.
Cj Anderson