The way I typically handle this is by modifying the Sql Agent job(s) that are responsible for starting/running the replication agents (depending on your replication topology, you'll have a variety of them in potentially different places). Simply add a job step to the appropriate agent job(s) (i.e. log reader agent, distribution agent, merge agent, queue agent, etc.) after the "run agent" step that gets executed if/when that step completes/fails (depending on whether or not you are using a continuous schedule).
For example, if I have a transactional uni-directional push publication setup, the distribution agent will be running at the distributor. If I connect to the distributor and find the Sql Agent job responsible for running the distribution agent for this publication, I can modify the job and add a step to send an email to a particular group if the "run agent" step fails/completes. If I am using a continuous replication schedule, I will simply add the step to email if the "run agent" step finishes (as I want to be notified if the agent stops for any reason). If I am using a non-continuous schedule, I may instead have the email step run only on failure of the "run agent" step. You can even configure this "email" step to send an email, pause for a bit, then try restarting the agent automatically (by simply configuring the step to "go to step 1" on success").
Here's a screen shot that depicts what the job steps look like for a distribution agent configured as I outline above:
You'll notice in the pic above that I've added a step called "Notify, pause, retry" which will be executed anytime the agent stops (success or failure - this is intentional as I am using a continuous replication schedule and simply want to know whenever the distribution agent isn't running for whatever reason). This step basically sends an email to a specific group, waits for a minute or two, then starts the agent up again. You can add code to do whatever you like including logging, restarting only a certain number of times in a certain time slice, etc. It's easily scripted and repeatable for any number of agents, publications, etc. (I have scripts to ensure any new replication agent in any type of topology includes this type of configuration - then it's simply a matter of adding them to a release tool or schedule the execution of, depending on how you deploy in your environment).