Looks like I'm answering my own question here! I'm not accepting my answer until I've been able to both test it on real hardware and get it into the app store. That said, I'll keep my Most Up-To-Date Info here, including which options didn't work.
Idea #1: It turns out that each NSRunLoop is thread-specific. If I create a UIApplicationMain in a separate thread, it doesn't get any messages. As a side effect this makes it impossible to determine when it's finished initializing, so if there's anything non-threadsafe it does, it just won't work. I may be able to send it a message across threads to figure out when it's finished initializing, but for now I'm calling this a dead end.
Idea #2: UIApplicationMain does a lot of subtle stuff. I'm not sure what it's restricted to, but I was unable to make anything work without involving UIApplicationMain. Idea #2 is right out.
Idea #3: Receiving the OS signals is important - you need to know if there's a phone-call overlay, or whether you're about to exit. On top of that, some of the setup messages seem vital in order to start the app properly. I was unable to find any method to keep messages being sent without being inside UIApplicationMain. The only options I came up with were NSRunLoop and CFRunLoop. Neither one worked - the messages didn't come in like I wanted. I may not be using these right, but in any case, Idea #3 is out.
Brand-new crazy Idea #4: It's possible to use setjmp/longjmp to fake coroutines in C/C++. The trick is to first set the stack pointer to some value that won't clobber anything important, then start your second routine, then jump back and forth, pretending like you have two stacks. Things get a tiny bit messy if your "second coroutine" decides to return from its main function, but luckily, UIApplicationMain never returns, so this isn't a problem.
I don't know if there's a way to set the stack pointer explicitly on real hardware, say, to a chunk of data that I allocated on the fly. Luckily, it doesn't matter. The iPhone has a 1MB stack by default, which is easily enough to fit a few coroutines in.
What I'm currently doing is using alloca() to push the stack pointer ahead by 768 kilobytes, then spawning UIApplicationMain, then using setjmp/longjmp to bounce back and forth between my "UI routine" and my "main routine". So far, this is working.
Caveats:
It's impossible to know when the "UI routine" has no messages to handle, and when it has no messages to handle, it will just block indefinitely until that's no longer the case. I'm solving this by making a timer that triggers every 0.1 milliseconds. Every time the timer triggers, I drop out to my "main routine", do a single game loop, then head back into the "UI routine" for another timer tick. Reading the documentation indicates that it won't stack up "timer calls" indefinitely. I do seem to get the "terminate" message appropriately, though I haven't managed to test it thoroughly yet, and I haven't tested any other important messages. (Luckily, there's only four messages total, and one of them is setup-related.)
Most modern OSes won't allocate the entire stack at once. The iPhone is probably one of these. What I don't know is whether bumping the stack pointer 3/4 of a meg forward will allocate everything "behind it", so to speak. If so, I may be effectively wasting 3/4 of a meg of RAM, which, on the iPhone, is significant. This could be handled by bumping the pointer forward a smaller amount, but this is really courting stack size disaster - it effectively limits your stack to however far you bump the pointer ahead, and you'll have to figure this out in advance. Some sentinel data in the stack, coupled with good monitoring and a logging system for stack size issues, can probably solve this, but it's a nontrivial issue. (Alternatively, if I can figure out how to muck with the stack pointer directly on the native hardware, I can just malloc()/new[] a few kilobytes, point the stack pointer at it, and use it as my new stack. I'll have to figure out how much space it needs but I doubt it'll be much, considering that it's not doing very much.)
This is currently untested on the actual hardware (give it a week or two, I got another project to finish first.)
I have no idea whether Apple will figure out what I'm doing and slap a giant REJECTED sticker on it when I try submitting to the app store. This is, shall we say, slightly outside their intentions for the API. Fingers crossed.
I'll keep this post updated, and officially accept it once I've verified that it, you know, works.
Late update: I got distracted by a variety of other things. Since then, I've had a few changes that make me far less interested in Apple development. My current approach showed no sign of not working, but I don't really have the motivation to keep fleshing it out. Sorry! If I ever change my mind I'll update this further, but Outlook Not So Good.