Make your threads as simple as possible.
Try not to use global variables. Global constants (actual constants that never change) is fine. When you do need to use global or shared variables you need to protect them with some type of mutex/lock (semaphore, monitor, ...).
Make sure that you actually understand what how your mutexes work. There are a few different implementations which can work differently.
Try to organize your code so that the critical sections (places where you hold some type of lock(s) ) are as quick as possible. Be aware that some functions may block (sleep or wait on something and keep the OS from allowing that thread to continue running for some time). Do not use these while holding any locks (unless absolutely necessary or during debugging as it can sometimes show other bugs).
Try to understand what more threads actually does for you. Blindly throwing more threads at a problem is very often going to make things worse. Different threads compete for the CPU and for locks.
Deadlock avoidance requires planning. Try to avoid having to acquire more than one lock at a time. If this is unavoidable decide on an ordering you will use to acquire and release the locks for all threads. Make sure you know what deadlock really means.
Debugging multi-threaded or distributed applications is difficult. If you can do most of the debugging in a single threaded environment (maybe even just forcing other threads to sleep) then you can try to eliminate non-threading centric bugs before jumping into multi-threaded debugging.
Always think about what the other threads might be up to. Comment this in your code. If you are doing something a certain way because you know that at that time no other thread should be accessing a certain resource write a big comment saying so.
You may want to wrap calls to mutex locks/unlocks in other functions like:
int my_lock_get(lock_type lock, const char * file, unsigned line, const char * msg) {
thread_id_type me = this_thread();
logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "get", msg);
lock_get(lock);
logf("%u\t%s (%u)\t%s:%u\t%s\t%s\n", time_now(), thread_name(me), me, "in", msg);
}
And a similar version for unlock. Note, the functions and types used in this are all made up and not overly based on any one API.
Using something like this you can come back if there is an error and use a perl script or something like it to run queries on your logs to examine where things went wrong (matching up locks and unlocks, for instance).
Note that your print or logging functionality may need to have locks around it as well. Many libraries already have this built in, but not all do. These locks need to not use the printing version of the lock_[get|release] functions or you'll have infinite recursion.