Within the java memory model? No, you are not ok.
I've seen a number of attempts to head towards a very 'soft flush' approach like this, but without an explicit fence, you're definitely playing with fire.
The 'happens before' semantics in
http://java.sun.com/docs/books/jls/third_edition/html/memory.html#17.7
start to referring to purely inter-thread actions as 'actions' at the end of 17.4.2. This drives a lot of confusion since prior to that point they distinguish between inter- and intra- thread actions. Consequently, the intra-thread action of manipulating counter isn't explicitly synchronized across the volatile action by the happens-before relationship. You have two threads of reasoning to follow about synchronization, one governs local consistency and is subject to all the nice tricks of alias analysis, etc to shuffle operations The other is about global consistency and is only defined for inter-thread operations.
One for the intra-thread logic that says within the thread the reads and writes are consistently reordered and one for the inter-thread logic that says things like volatile reads/writes and that synchronization starts/ends are appropriately fenced.
The problem is the visibility of the non-volatile write is undefined as it is an intra-thread operation and therefore not covered by the specification. The processor its running on should be able to see it as it you executed those statements serially, but its sequentialization for inter-thread purposes is potentially undefined.
Now, the reality of whether or not this can affect you is another matter entirely.
While running java on x86 and x86-64 platforms? Technically you're in murky territory, but practically the very strong guarantees x86 places on reads and writes including the total order on the read/write across the access to cacheflush and the local ordering on the two writes and the two reads should enable this code to execute correctly provided it makes it through the compiler unmolested. That assumes the compiler doesn't step in and try to use the freedom it is permitted under the standard to reorder operations on you due to the provable lack of aliasing between the two intra-thread operations.
If you move to a memory with weaker release semantics like an ia64? Then you're back on your own.
A compiler could in perfectly good faith break this program in java on any platform, however. That it functions right now is an artifact of current implementations of the standard, not of the standard.
As an aside, in the CLR, the runtime model is stronger, and this sort of trick is legal because the individual writes from each thread have ordered visibility, so be careful trying to translate any examples from there.