It's important to understand that there are two aspects to thread safety: (1) Execution sequence (happens-before relationships), and (2) memory visibility. The first has to do with what happens, and when, and the second to do with when the effects in memory of what has been done are visible to other threads. Because each CPU has several levels of cache between it and main memory, threads running on different CPUs or cores can see "memory" differently at any given moment in time because threads are permitted to obtain and work on local copies of main memory.
Using synchronized prevents any other thread from obtaining the monitor (or lock) for the same object, thereby preventing any and all code protected by synchronization on the same object from ever executing concurrently. Importantly, synchronization also creates a memory barrier, causing a memory visibility constraint such that anything that is done after some thread acquires a lock appears to another thread subsequently acquiring the same lock to have happened before the other thread acquired the lock. This causes flushing of the CPU caches when a monitor is acquired and when it is released, which is expensive (relatively speaking).
Volatile, on the other hand, simply forces all accesses (read or write) to the volatile variable to occur to main memory, effectively keeping the volatile variable out of CPU caches. This can be important for some actions where it is simply required that visibility of the variable be correct and order of accesses is not important.
I just yesterday had some code where a shared but immutable object is recreated on the fly, and I needed to update several references to the shared object - volatile is perfect for that situation. I needed the other threads to see the recreated object as soon as it was published, but did not need the additional overhead of full synchronization and it's attendant contention and cache flushing.
Speaking to your read-update-write question, specifically. Consider the following unsafe code:
public void updateCounter() {
if(counter==1000) { counter=0; }
else { counter++; }
}
Now, with the updateCounter() method unsynchronized, two threads may enter it at the same time. Among the many permutations of what could happen, one is that thread1 does the test for counter==1000 and finds it true and is then suspended. Then thread2 does the same test and also sees it true and is suspended. Then thread1 resumes and sets counter to 0. Then thread2 resumes and again sets counter to 0 because it missed the update from thread1. This can also happen even if thread switching does not occur as I have described, but simply because two different cached copies of counter were present in two different CPU cores and the threads each ran on a separate core. For that matter, one thread could have counter at one value and the other could have counter at some entirely different value just because of caching.
What's important in this example is that the variable counter was read from main memory into cache, updated in cache and written back to main memory at some indeterminate point later when a memory barrier occurred.