views:

161

answers:

5

I have almost 100.000 records in the database and I need to compare them to each-other with the Longest Common Subsequence algorithm, and I need to do that with 1000 new records every day. My application is written in c# .Net, and the problem is that this comparing is working slow on the application level, for comparing of 1000 records are needed more than 10 hours. So does anyone knows how much faster will this go if I wrote this algorithm in Stored procedure in SQL, or is there any other way?

A: 

Its true that, stored procedure works faster than LinQ or View. That is the way, to collect your data fast.

Serkan Hekimoglu
Only if you write your stored procedure in a clean and correct way, I am sure some developers are able to write Stored Procedures that will perform very bad. So saying SP is always faster is not true, it CAN be faster (if done by a good developer).
Gertjan
For ex: with linq you can use where condition, but it will get all data again, and collect specific depends on your where condition. You are rigth, you need to be clear while writing stored procedure. SP will faster, if you write conditions correctly.
Serkan Hekimoglu
+3  A: 

If you have 'just' 100.000 records. Just collect them all when your app starts. Do your algorithms in memory, and store any results/alterations to the db when you finish.

It'll be much faster

Toad
Even if the records are too large to fit into memory all at once, say, one megabyte each, loading a subset (say 500) and running the LCS algorithm on that batch, noting best answers, and then moving on to the next batch of 500 records would probably still be better than iterating over the entire 100,000 for each input string.
sarnold
+1  A: 

I'm not sure TSQL will allow you the same flexibility as C# allows you, especially when you deal with complex algorithms like LCS. Store all needed records in memory and deal with them from there.

Now most important thing is that you can think out of box for a minute and go for other approach, try to insert flags(ranking) of some kind once new item is inserted. Noone can advice you here since you haven't provided use with little bit of data what are you doing and what are you comparing. Probably you can ease on process with some ranking made during new item insertion. I don't mean to make full comparison once new item added but to trigger event like every hour or so you update table without user input.

eugeneK
+5  A: 

You might want to try and write a stored proc in C# if you are using SQL server 2005 or 2008. This might scale better in the long run as you get more and more records and can't keep them all in memory.

Check out the MSDN Introduction to SQL Server CLR Integration.

This will use more CPU on your DB server, but you don't have to transfer data back and forth.

Mikael Svenson
A: 

How do you determine that two of your records follow on from each other (i.e. that they're part of a sub-sequence)? Maybe you don't need to compare the whole 1MB of each record and could speed things up by only analysing some portion of that?

Sounds to me like your algorithm's flawed or that a DB might not be the best way of storing your data if it's taking 2 seconds to compare each record?

Jon Cage
I compare the new inserted strings with all the others in the database with the Longest Common Subsequence algorithm.
Pece