views:

800

answers:

3

Short question: does anyone have any information about what the memory performance of Linq is with large amounts of data? Specifically Linq to Datasets or Linq to Objects.

What is the largest number of records / objects it can handle? What is the memory overheads? Can anyone post some ballpark memory usage figures?

Background

My company interacts with an external database (which shall remain unnamed due to excessive crapness) with very poor performance. It has an ODBC interface that works well for basic SELECTS, but with a few JOINs performance falls off a cliff (and woe to anyone who tries to use OUTER JOINS!).

Previously we've solved this issue by making a copy of the data with SELECT * FROM Table statements into a SQL database and running our queries from SQL. However, we'd like to remove the SQL server from the equation (removing external dependencies).

The solution I have in mind is to do SELECT * FROM Table into some DataSets (or possibly custom objects) and use Linq to do all the queries in memory.

The data we interact with is limited to 2GB in size (and we don't need to load it all up at once, probably only a few hundred MB at maximum). However, I'm worried about seeing the dreaded OutOfMemoryException for larger amounts of data. Hence my question.

The ODBC driver is 32bit, so I can't use a 64bit process (not without piping data between two processes, and I'd prefer to keep the complexity to a minimum). The ODBC driver is read only.

Other comments or suggestions regarding this are welcome too (with the exception of using an embedded database like SQL Compact, that's our plan B if in memory queries aren't feasible).

PS: I am doing some proof of concept benchmarks over the next few days (and I am aware that there will be specifics for my case that will only be known with those benchmarks), but I'd like to see someone has already had experience with this.

Edit: This is going to be deployed using ClickOnce as a Windows Forms app.

A: 

This link might help.

xandy
It's an interesting link, but does it really talk about memory performance?
Robert Harvey
sorry, i misunderstand your question. For Linq2Sql, all I know it is actually using SqlReader as the underlying implementation. So, the limit would be more or less similar to sqlreader.
xandy
Nice link, but it didn't really deal with any memory issues.
A: 

Without knowing your application, can only give general comments.

  1. If this is a web application, unless you are going to create an in-memory db solution that can be shared across requests, etc. (which SQL Server and others are already doing this for you), I'd strongly advise against this since it won't be too scalable.

  2. I'd still advise against this since I don't think index mechanism is built into LINQ. Using LINQ to do this is basically similar to a TABLE SCAN in SQL talk, where you would walk through the entire table to look for the right data instead of using the more efficient indexing that SQL can provide for you.

  3. Yes, you will incure dependency and network transfer time when using SQL Server, but I'd imagine given the right query and returning the right resultset, SQL will still beat what you are trying to do.

So, recommendation... don't reinvent the wheel :) Someone else already create a more efficient mechanism for this.

If you still want to do this, I strongly suggest you do extensive test on your own real data. Since in my experience, testing w/ dummy data that is not representative of what the actual data would be (in size and in content), will never be sufficient if your goal is performance.

Jimmy Chandra
It's WinForms, not web. That's why we're trying to offload SQL Server, to minimise the install and maintenance footprint.
I think you already got your answer in the question :). In that case my choice will probably be something in the same line (Sql CE). You can still use LINQ to object for internal processing w/ that combo. I don't think forcing LINQ to replace a real DB is a good choice.
Jimmy Chandra
I'll take that on board. I'm only trying to wrap a query interface around a bad data source, and linear searches will be much better than what we'd have otherwise!
+1  A: 

See the following StackOverflow post; It contains some good tips on how to manage memory with Linq to SQL:

How do I avoid a memory leak with LINQ-To-SQL?
http://stackoverflow.com/questions/123057/how-do-i-avoid-a-memory-leak-with-linq-to-sql/123206

Robert Harvey
Didn't see that when writing my question. Helpful, but not quite what I was looking for.