I have a Windows Workflow instance that's using SQL persistence, being hosted in the web runtime, since the workflows are started by ASP.NET form submissions. It runs great most of the time, but I've noticed instances where I have to kick things:
I notice the nextTimer has gone way overdue, even by hours. Sometimes the ownerID and ownedUntil fields are null in the persistence database, sometimes not. The "unlocked" and "blocked" fields are always both "1".
...and then the workflow runtime doesn't pick it back up until I null out the "owner" fields if they're populated and kick the application pool with a recycle, and things go along just fine after that for the most part. There are no errors (I have try/catch blocks around everything and write out anything caught into a trace file), so that's not it.
The delay activities causing the persistence are all set to one minute, and the ownership duration for the runtime is 60 seconds as well. The code that it gets stuck on should always take less than a minute.
As I write this, I'm curious if recycles of the app pool/app domain are causing it...when the workflow tries to call whatever method in the runtime, it's busy spinning up the app domain/pool and might leak over the 60 seconds ownership duration. That sound remotely plausible, and would that cause it to not rehydrate properly?
Barring that sidetrack, what could cause this behavior I'm seeing? I don't want to babysit the runtime every day by unsticking stuck workflows.