views:

515

answers:

5

I have a Windows Workflow instance that's using SQL persistence, being hosted in the web runtime, since the workflows are started by ASP.NET form submissions. It runs great most of the time, but I've noticed instances where I have to kick things:

I notice the nextTimer has gone way overdue, even by hours. Sometimes the ownerID and ownedUntil fields are null in the persistence database, sometimes not. The "unlocked" and "blocked" fields are always both "1".

...and then the workflow runtime doesn't pick it back up until I null out the "owner" fields if they're populated and kick the application pool with a recycle, and things go along just fine after that for the most part. There are no errors (I have try/catch blocks around everything and write out anything caught into a trace file), so that's not it.

The delay activities causing the persistence are all set to one minute, and the ownership duration for the runtime is 60 seconds as well. The code that it gets stuck on should always take less than a minute.

As I write this, I'm curious if recycles of the app pool/app domain are causing it...when the workflow tries to call whatever method in the runtime, it's busy spinning up the app domain/pool and might leak over the 60 seconds ownership duration. That sound remotely plausible, and would that cause it to not rehydrate properly?

Barring that sidetrack, what could cause this behavior I'm seeing? I don't want to babysit the runtime every day by unsticking stuck workflows.

+1  A: 

Have you checked the clock on your db and web servers (if they are not the same server)? I've had similar problems before with workflow and the root cause was that the db and web server clocks were not in sync.

Tundey
I hadn't thought of that, but checked and they're in-sync to the second. Good idea though, thanks!
Chris
+5  A: 

Its quite likely that the app domain recycling is a large part of your problem. IIS will recycle an AppDomain as soon as the last request is finished. It does not however see code running on another thread as part of that request. That is one of main reasons for using the ManualWorkflowSchedulerService when hosting in IIS. But when you use the active timers option it still uses a background thread to execute workflow activities.

Also make sure you unload workflows as soon as they go idle. The easiest way of doing so is using the UnloadOnIdle setting on the SqlWorkflowPersistenceService.

The PersistenceService checks for workflows with an expired ownership but only at startup time. So most likely restarting the IIS worker process will also restart old workflows without any extra work. But as this is the case of new problems..... Just clearing out the old ownership should also do the trick. In that case the PersistenceService should just reload the workflows at the next time. The only trick is to know which runitme ID is old and which isn't (the property holding the value is not public).

Another thing to make sure of is that the IIS worker process is reloaded. If this isn't done there is no WF runtime so it cannot check for expired timers. It sounds like you have this covered but just in case.

Maurice
Excellent writeup, I'll try as much as I can tomorrow and report back.
Chris
Marked as accepted because it covered a lot of ground, thanks for the info! What it ended up being was ownership time. I upped instanceOwnershipDuration to allow for the full execution of whatever it was trying to do for every loaded activity, and it hasn't glitched since.
Chris
+2  A: 

Workflow instances are locked to a runtime (so multiple workflow runtimes can share a database without instances being handled by both). When the AppDomain recycles, the Runtime should be stopped, causing the instances to become unlocked

This might be a redundant, I didn't check for that, but it helped in unlocking the workflow instances:

AppDomain.CurrentDomain.DomainUnload += ((sender, args) =>
                                             {
                                                 if (_runtime.IsStarted)
                                                     _runtime.StopRuntime();
                                             });
AppDomain.CurrentDomain.ProcessExit += ((sender, args) =>
                                            {
                                                if (_runtime.IsStarted)
                                                    _runtime.StopRuntime();
                                            });
Sander Rijken
I did something similar in global.asax's "Application_End" function, stopping the runtime there. Are our approaches two ways of doing the same thing? Never used the above calls. Thanks for the input, +1!
Chris
A: 

Thats not an issue. The sql file which you have executed takes geometrical mean time(GMT). thats why the db and web server clocks were not in sync. You might be facing some other problem.

suryakiran
A: 

Geometrical mean time ? I didn't realise there was geometry in "Greenwich" - what kind? Triangles or rhombus? :)

Mike