views:

259

answers:

4

I'm trying to make sense of a situation I have using entity framework on .net 3.5 sp1 + MySQL 6.1.2.0 as the provider. It involves the following code:

Response.Write("Products: " + plist.Count() + "<br />");
var total = 0;
foreach (var p in plist)
{
//... some actions
    total++;
//... other actions
}
Response.Write("Total Products Checked: " + total + "<br />");

Basically the total products is varying on each run, and it isn't matching the full total in plist. Its varies widely, from ~ 1/5th to half.

There isn't any control flow code inside the foreach i.e. no break, continue, try/catch, conditions around total++, anything that could affect the count. As confirmation, there are other totals captured inside the loop related to the actions, and those match the lower and higher total runs.

I don't find any reason to the above, other than something in entity framework or the mysql provider that causes it to end the foreach when retrieving an item.

The body of the foreach can have some good variation in time, as the actions involve file & network access, my best shot at the time is that when the .net code takes beyond certain threshold there is some type of timeout in the underlying framework/provider and instead of causing an exception it is silently reporting no more items for enumeration.

Can anyone give some light in the above scenario and/or confirm if the entity framework/mysql provider has the above behavior?


Update 1: I can't reproduce the behavior by using Thread.Sleep in a simple foreach in a test project, not sure where else to look for this weird behavior :(.


Update 2: in the example above the .Count() always returns the same + correct amount of items. Using ToList or ToArray as suggested gets around of the issue as expected (no flow control statements in the foreach body) and both counts match + don't vary on each run.

What I'm interested in is what causes this behavior in entity framework + mysql. Would really prefer not having to change the code in all the projects that use entity framework + mysql to do .ToArray before enumerating the results because I don't know when it'll swallow some results. Or if I do it, at least know what/why it happened.

+2  A: 

If the problem is related to the provider or whatever, then you can solve/identify that by realising the enumerable before you iterate over it:

var realisedList = plist.ToArray();
foreach(var p in realisedList)
{
  //as per your example
}

If, after doing this, the problem still persists then

a) One of the actions in the enumerator is causing an exception that is getting swallowed somewhere

b) The underlying data really is different every time.

UPDATE: (as per your comment)

[deleted - multiple enumerations stuff as per your comment]

At the end of the day - I'd be putting the ToArray() call in to have the problem fixed in this case (if the Count() method is required to get a total, then just change it to .Length on the array that's constructed).

Perhaps MySql is killing the connection while you're enumerating, and doesn't throw an error to EF when the next MoveNext() is called. EF then just dutifully responds by saying that the enumerable is simply finished. If so, until such a bug in the provider is fixed, the ToArray() is the way forward.

Andras Zoltan
I guess I wasn't clear on that in my description: the .Count() always returns the same number, so that rules out b) i.e. weird issue being both counts having different numbers and the one in the body of the foreach varying on each run.I had tried a), although with the similar ToList(), and as expected that gets around of the issue and both counts match + don't vary on each run. As I mentioned, I don't have any flow control statements in the foreach body that could cause the foreach to terminate too soon or to skip some of the total++.
eglasius
@Freddy - if the ToList()/ToArray() solves the problem, then my solution would be to use that. It's possible that you're getting issues here because you're iterating the enumerable twice (once with the 'Count()' and once with the foreach). Linq-To-Sql, for example, can get very upset when this is done. It is also entirely possible that such errors are due to the data provider - in which case I'd reflect the provider (good luck!) to see if there's a bug somewhere.
Andras Zoltan
@Andras - haven't run into that when using linq-to-sql, but as I avoid unnecessary round-trips its possible that I never had double enumerations before in other projects. Previous said, I'm sure it isn't the case here, as the .Count() was added to debug the issue as the totalin the foreach was varying unexpectedly on each run i.e. issue was occurring without the Count() and no previous enumeration was done. If no one else chimes in, I will definitely use reflector on the provider and find out what's going on / and how specific or not it is to the scenario.
eglasius
@Freddy - I apologise - the L2S multi-enumeration block only applies to stored procs. http://social.msdn.microsoft.com/Forums/en-US/linqprojectgeneral/thread/f1aa1537-8759-4e0c-a9d8-05b7ded7ddfb. There must be a bug in there which occasionally causes MySql provider to think that there are no more results when enumerating over a long-ish period of time; hence why the ToArray() works; perhaps MySql provider simply closes the connection automatically after a period of time, and doesn't subsequently throw an error when more results are requested.
Andras Zoltan
A: 

I believe it has to do with the way the EF handles lazy loading. You might have to use either Load() or Include() and also check using IsLoaded property within your processing loop. Check out these two links for more information:
http://www.singingeels.com/Articles/Entity_Framework_and_Lazy_Loading.aspx
http://blogs.msdn.com/jkowalski/archive/2008/05/12/transparent-lazy-loading-for-entity-framework-part-1.aspx

I apologize I don't know more about EF to be more specific. Hopefully the links will provide enough info to get you started and others can chime in with any questions you might have.

ktharsis
+1  A: 

I think actually that you hit on the answer in your question, but it may be the data that is causing the problem not the timeout. Here is the theory:

One (or several) row(s) in the result set has some data that causes an exception / problem, when it hits that row the system thinks that it has reached the last row.

To test this you could try:

  • Ordering the data and see if the number returned in the for each statement is the same each time.
  • Select only the id column and see if the problem goes away
  • Remove all rows from the table, add them back a few at a time to see if a specific row is causing the problem

If it is a timeout problem, have you tried changing the timeout in the connection string.

Shiraz Bhaiji
A: 

The issue, cause and workaround is described exactly in this mysql bug.

As suspected it Is a timeout related error in the provider, but its not the regular timeout i.e. net_write_timeout. That's why the simple reproduction in a test project didn't work, since the timeout relates to All the cycles of the foreach and not just a particularly long body between the read of 2 rows.

As of now, the issue is present in the latest version of the MySql provider and under normal conditions would only affect scenarios where rows are being read with a connection maintained for a long time (which might or not involve a slow query). This is great, because it doesn't affect all of the previous projects where I have used MySql / applying the workaround to the sources also means it doesn't fail silently.

Ps. couple of what seem to be related mysql bugs: 1, 2

eglasius