I need to manipulate 100,000 - 200,000 records.
I am thinking of using LINQ (to SQL) to do this.
I know from experience that filtering dataviews is very slow.
So how quick is LINQ?
Can you please tell me your experiences and if it is worth using, or would I be better off using SQL stored procedures (heavy going and less flexible)?
Within the thousands of records I need to find groups of data and then process them, each group has about 50 records.
views:
1859answers:
7Normally the manipulation of that many records should happen as close as possible to the db. If it where my task I would look to do it in stored procs. That me personally. Linq is yet another layer of abstraction on top of data access and while it works well for "normal" needs i.e. a few hundred entities sent to the UI it should not be thought of as a replacement for data warehouse type operations.
LINQ to SQL translates your query expression into T-SQL, so your query performance should be exactly the same as if you sent that SQL query via ADO.NET. There is a little overhead I guess, to convert the expression tree for your query into the equivalent T-SQL, but my experience is that this is small compared with the actual query time.
You can of course find out exactly what T-SQL is generated, and therefore make sure you have good supporting indexes.
The primary difference from DataViews is that LINQ to SQL does not bring all the data into memory and filter it there. Rather it gets the database to do what it's good at and only brings the matching data into memory.
It depends on what you're trying to do. LINQ has been very fast for me to pull data from the database, but LINQ-to-SQL does directly translate your request to SQL to run it. However, there are times that I've found using Stored Procedures is better in some circumstances.
For instance, I have some data that I need to query which involves several tables, and fairly intense keys. With LINQ, and the relatively inflexibility of LINQ to customize queries, these queries would take several minutes. By hand-tweaking the SQL (namely, by placing 'WHERE'-type arguments in JOIN's in order to minimize the data intensity of the JOIN), I was able to drastically improve performance.
My advice, use LINQ wherever you can, but don't be afraid to go the Stored Procedure route if you determine that the SQL generated by LINQ is simply too slow, and the SQL can be hand-tweaked easily to accomplish what you need.
You need to be more specific with what you mean by manipulate the records. If the changes are not 100% individual for each record and can be made set-based you are most likely better of doing the changes in T-SQL on the db side (stored procs). In other words avoid pulling large amounts of data over network and/or process boundaries if possible.
How long is a piece of string? How fast is LInq to SQL. It depends on how you use it.
"filtering dataviews is very slow" becuase in this model you retrieve all the records and then filter on the client. But Linq to SQL doesn't work like that unless you abuse it.
A Linq query is only evaluated at the last possible minute that it has to be. So you can add "where" restrictions on a query before it is evaluated. The whole expression, including the filters, will execute on the database, as it should.
Stackoverflow uses Linq, and it's not a small database.
Some will advocate stored procs to access your database over SQL or ORMS. This has been debated in other questions. Eg here and here
My opinion is that for some things, you will want a professional DBA to craft an optimal stored proc. You can then access this from Linq if you want. But 80% or more of the database access methods won't be performance-critical, and stored procs can be time-consuming overkill for these.
For updates, set-based server-side operations in a stored proc or sql with an "update ... where ... " will be a lot faster than using multiple database round-trips to read a record, write a record, repeat.
i find LINQ generated queries are good. there some best practices implemented in linq queries, such us, prefix table name from owner, avoid (*) and so on. when queries are complex (more than a simple join) i found linq always find a good solution, and my solution never was better (so my SQL profiler says).
Then the question is: it's better direct query... or wrapping query into stored proc? stored proc should be better, because execution plan is stored. but in fact, when you make a select by .net sql server provider, you call a special stored procedure, where first parameter is your query text. then execution plan is cached anyway.
If in your store you make more than 1 select, a stored shuold be better.
It's worth bearing in mind that LINQ to SQL works by retrieving the object from the database first, you then apply property changes to the objects and call SubmitChanges to persist them back whereupon each row/object emits the necessary update statement.
For bulk updates this is nowhere near as efficient as just sending a single update statement that applies to an entire batch of rows at a time.