This is sort of the next step of the LINQ to DB2 question I asked here.
Following zb_z's answer, I poked around a bit with the code for DB_Linq and have managed to add working DB2 support. (It's still in its infancy now, not ready to be contributed back to the project yet.) The proof of concept worked great, it was pretty exciting actually. However, I've run into another hiccup along the way.
As it turns out, our DB2 database is big. 8,306 tables big. So the code that was generated turned out to be over 5.2 million lines of code. In one file. Needless to say, Visual Studio didn't much care for it :)
So I further modified the generator to spit out each table class into its own file. This left me with 8,307 files (the data context and one for each table, which extend the data context with table properties). Visual Studio still didn't like it, understandably, so I wrapped up the code generation and compilation in a script and just run that to output a DLL for my projects to use.
A 36 MB DLL.
Now, searching around a bit on performance led me to this SO question (which itself references this one) and I've followed the answers and the links and see what they're saying. So this leads me to wonder if its perhaps the existence of over 8,000 classes within the same namespace that's the culprit of noticeable performance issues.
My test for performance was to write a little console app that initializes the data context, grabs the data with LINQ, prints out a row count, grabs the data with classic ADO, and prints out another row count. Each statement includes a time stamp. Adding more queries to test, etc. always results in the same performance. The LINQ code takes several seconds to run, while the ADO fills the dataset in the blink of an eye.
So I guess this ends up being a somewhat open-ended (and long-winded, sorry about that) question. Does anybody have any ideas on speeding up performance here? Anything simple to tweak, or design considerations I could apply?
EDIT
A few things to note:
- If I restrict the code generation to a subset of tables (say, 200) then it runs much faster.
- Stepping through in the debugger, the length of time is spent on the line
var foo = from t in bank1.TMX9800F where t.T9ADDEP > 0 select t.T9ADDEP
and when I expand the property in the debugger to enumerate the results (or let it go to the next line which does a .Count()) then that part takes no time at all.
EDIT
I can't post the entire generated DLLs, but here's the code for the test app:
static void Main(string[] args)
{
Console.WriteLine(string.Format("{0}: Process Started", DateTime.Now.ToLongTimeString()));
// Initialize your data contexts
var bank1 = new BNKPRD01(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
var bank6 = new BNKPRD06(new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString));
Console.WriteLine(string.Format("{0}: Data contexts initialized", DateTime.Now.ToLongTimeString()));
var foo = from t in bank1.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), foo.Count().ToString()));
var baz = from t in bank6.TMX9800F where t.T9ADDEP > 0 select t; // <- runs slow
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), baz.Count().ToString()));
var ds = new DataSet();
using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
{
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT * FROM BNKPRD01.TMX9800F WHERE T9ADDEP > 0";
new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
}
}
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD01 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));
ds = new DataSet();
using (var conn = new iDB2Connection(ConfigurationManager.ConnectionStrings["DB2"].ConnectionString))
{
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT * FROM BNKPRD06.TMX9800F WHERE T9ADDEP > 0";
new IBM.Data.DB2.iSeries.iDB2DataAdapter(cmd).Fill(ds);
}
}
Console.WriteLine(string.Format("{0}: {1} records found in BNKPRD06 test table", DateTime.Now.ToLongTimeString(), ds.Tables[0].Rows.Count.ToString()));
Console.WriteLine("Press return to exit.");
Console.ReadLine();
}
Maybe I'm missing something obvious or there's something about LINQ I didn't grok?
EDIT
Upon discussion with Jon and Brian below, I've stepped further into the DB_Linq code that gets called when the LINQ query is created and came across the long step:
public override IEnumerable<MetaTable> GetTables()
{
const BindingFlags scope = BindingFlags.GetField |
BindingFlags.GetProperty | BindingFlags.Static |
BindingFlags.Instance | BindingFlags.NonPublic |
BindingFlags.Public;
var seen = new HashSet<Type>();
foreach (var info in _ContextType.GetMembers(scope))
{
// Only look for Fields & Properties.
if (info.MemberType != MemberTypes.Field && info.MemberType != MemberTypes.Property)
continue;
Type memberType = info.GetMemberType();
if (memberType == null || !memberType.IsGenericType ||
memberType.GetGenericTypeDefinition() != typeof(Table<>))
continue;
var tableType = memberType.GetGenericArguments()[0];
if (tableType.IsGenericParameter)
continue;
if (seen.Contains(tableType))
continue;
seen.Add(tableType);
MetaTable metaTable;
if (_Tables.TryGetValue(tableType, out metaTable))
yield return metaTable;
else
yield return AddTableType(tableType);
}
}
That loop iterates 16,718 times.