It's known as "super linear speedup", and can occur for a variety of reasons, though the most common root cause is probably cache behaviour. Usually when superlinear speedup occurs, it's a clue that you could make the sequential version more efficient.
For example, suppose you have a processor where some of the cores share an L2 cache (a common architecture these days), and suppose your algorithm makes multiple traversals of a large data structure. If you perform the traversals in sequence, then each traversal will have to pull the data into the L2 cache afresh, whereas if you perform the traversals in parallel then you may well avoid a large number of those misses, as long as the traversals run in step (getting out of step is a good source of unpredictable performance here). To make the sequential verison more efficient you could interleave the traversals, thereby improving locality.