views:

302

answers:

7

Edit: This application searches in a Sql Server 2000 data base not in a flat file, the flat file only contains the serial numbers that I should look for them in my SQL Server DB table (i.e. the flat file lines will be used as parameters for the where clause only). Hope if it's clearer now!

I'm working on .NET windows application that should do a simple searching function. The application searches for some cards by their serial numbers, those serial numbers are imported in a text file and I simply open a StreamReader on the file and start reading lines-as each line contains only one serial. After retrieving the data, I then display them all on a DataGridView.

The annoying thing about this is, those serials on the file are not in a certain order (i.e. i can't do Select * from table where serial between( min and max)); they're totally not related. So without further ado, here's what I've done:

DataTable table = new DataTable()

 StreamReader stream= new StreamReader(fileName);                

            while (!stream.EndOfStream) {
                string serial = stream.ReadLine();
                SqlDataReader reader= GetCardBySerial(serial);
                table.Load(reader);
                reader.Close();
            }

  public SqlDataReader GetCardBySerial(string serialNo) {

        SqlConnection cnn = new SqlConnection(connectionString);
        SqlCommand cmd = new SqlCommand("Cards_GetCardBySerial", cnn);
        cmd.CommandType = CommandType.StoredProcedure;
        cmd.Parameters.Add("@serialNo", SqlDbType.NVarChar).Value = serialNo;
        cnn.Open();
        return cmd.ExecuteReader(CommandBehavior.CloseConnection);   

    }

Though this works, but it's very slow to me, any ideas?

PS: My file can contain up to 20k serials.

+2  A: 

It looks like you're opening a new connection for each query.

You would be best off to create a single connection then reuse it for each query.

DataTable table = new DataTable()
SqlConnection cnn = new SqlConnection(connectionString);

StreamReader stream = new StreamReader(fileName);                

while (!stream.EndOfStream) {
    string serial = stream.ReadLine();
    SqlDataReader reader = GetCardBySerial(serial, cnn);
    table.Load(reader);
    reader.Close();
}
cnn.Close();

public SqlDataReader GetCardBySerial(string serialNo, SqlConnection cnn) {
    SqlCommand cmd = new SqlCommand("Cards_GetCardBySerial", cnn);
    cmd.CommandType = CommandType.StoredProcedure;
    cmd.Parameters.Add("@serialNo", SqlDbType.NVarChar).Value = serialNo;
    return cmd.ExecuteReader();
}
chills42
Connection pooling is on by default, so the cost of "opening" a connection repeatedly is insignificant, unless "Pooling=false" is in the connectionString. Measure first, measure again, only then optimize!
Pontus Gagge
A: 

You have a couple of options to speed things up. I know which I would prefer, but you have to go where you feel most comfortable.

As this is one serial per line, consider loading the serials into something other than a table. A SortedList can work, if you want an A to Z or 1 to 9 type of sort.

Another option is to use LINQ. There are ways to accomplish this against the flat file, but it can also be done against objects, if you wish to load your data that way.

With the sheer number of serials you have, I would probably use a different persistent store, as a flat file will always take some time to load up.

ADDED 9:36 AM CST 2008/03/11

Thank you for the clarification. I assumed sort was the issue. To speed up the data access, you have to get more data at one time. Although it has been suggested using a single connection, this will not have a noticeable perf increase, as the instantiation of the conneciton from the underlying pool is not your issue. It is the massive number of queries (20k +).

To speed things up, you have to ask for more in a single query. One option is to create a dynamic SQL statement to request the serials. Another is to use XML and have SQL use it as an in memory "table".

No matter what you do, you will trade a bit off latency time for SQL time. But, in every case I can think of, the latency will kill you long before the SQL Server time.

If you are aiming for loading, you can BCP or BULK_INSERT directly from the file.

Gregory A Beamer
The text file is not the issue here. The text file is being handled fine, it's getting the resulting data from the database that is slow.
NerdFury
+6  A: 

I would think that the quickest way would be to bulk insert the serials into a table on SQL Server. Then do your search on ALL the serials with a single SQL statement by joining the search table into the Cards table on Serial number.

Jason Punyon
+1, I was about to click 'Post your answer' when I saw your answer appear ;)
ybo
Be aware of what could happen if two files are processed at the same time (if this is ever an issue), you might need to create a SearchSession( Id int, UniqueName varchar(32) ) Table, and a SearchData( SearchSessionId int, Serial varchar(64)) Table to ensure you only work with your data.
NerdFury
The serial field is not the PK, and it doesn't have a unique constraint on it (it's values are guaranteed to be unique by the app logic). Given that, can I use it for the join?
Galilyou
@NerdFury Yeah, gotta watch out for that!
Galilyou
@7alwagy: As long as its unique and you want to see one line, then yeah, you could use it to join on. You might have to make sure that the search list is unique in order to get a single line in the output.
Jason Punyon
Tried and it's really faster, not sure how the performance be when this temp table keeps getting bigger and bigger, actually i'm scared of that, and to avoid it i decided to create a new temp (#) table each time a search is performed. @Jason good answer, here's the point
Galilyou
You only need to keep them until the search is complete, then you can dump them so the temp table doesn't get too large. If they want to search for the same list again, they can just start all over and upload the same search list again.
Jason Punyon
A: 

No golden bullet solution for you but I'd suggest reading the entire serial's file into memory, perhaps into an iEnumerable so that you can use Linq on it and get your sorted list that way.

I'd probably use a temp table in my SQL db to store the serials and then use a stored procedure that joined the tables to extract the matching records straight down to the table through the reader. That way you off load most of the effort to the SQL Server which is best suited to this activity after all.

Lazarus
A: 

Instead of doing 20K lookups on the database, create a temp SQL table to hold all the desired serial numbers, and load it. Then, build an index on that table that will allow your DBMS to do a merge join between the recently imported serial numbers and the cards already in the database.

Do your query on the join of those two tables, and process the query result for display.

Then you can drop or truncate the temp table.

Building an index on a new table will take some time, but a lot less time than 20k queries.

Walter Mitty
A: 

Bottom line: use a temporary table for joins.
1) (one-time) Create a new "temp" load table with unique session id to differentiate the rows loaded from a given file.
2) (one-time) Create a new Stored Procedure that joins from the source table to the "temp" table on the card's serial where the session id is the new parameter.
3) Before you run the query, do a one-time bulk-load of the data from the text file into that "temp" table where you assign it a new session id (I would use a guid for that).
4) Run the single new SP giving it that guid as the new single parameter.

You should now have a single query you run against the server instead of thousands of little queries.

EDIT
I would not use a true temp table. Create a real table with the two fields (session (PK, unique_identifier), Card ID (your data type)) and then create an index on both fields. That way, you don't have to worry about creating/dropping the table each time. Instead, create another SP that you can call to clean up your temp rows.

A: 

Two other options you could consider, depending on your version of mssql... you could construct xml from the imported serials and send that to the db, parse on the db side and return all matches with one trip. The other option is similar, constructing a long comma delimited string of all imported serial numbers and sending that to the sproc into a variable with datatype varchar(max) which can hold 2Gb of text. Then use an "in" clause in your query - again eliminating the multiple trips to the db.

Aaron Palmer
A string with 10000 lines on a network! Sounds like a bottleneck to me, but worth trying anyways.
Galilyou