Generally the voice application services like twilio, tropo etc. work by requesting the controlling application what to do based on events that happen during the call. For example when a new call arrives they will send a request to the pre-designated controlling application asking what they should do and then when the call is answered they will ask for further actions etc.
In your case you want it to work the other way around and you want to tell the server processing the call to do something in response to an event from your end. There may be an API call you can execute on the twilio, tropo etc. platforms to initiate actions but I can't recall it. More likely you will need to use something like the Asterisk AGI protocol which allows actions to be pushed to the server while it's processing the call. Cloudvox is one provider I know of that supports a hosted service that supports an AGI interface so they'd be worth a look, certainly a lot easier than building your own Asterisk server.
I'd also recommend checking out Anveo as their offerings are generally a bit more sophisticated than the others and they will often implement features very quickly so you could always ask for a web API that will play an mp3 during a live call.