The usual performance argument runs thus:
Generational GC are fast because they rely on the heuristic that many allocated objects are short-lived (an object is "live" as long as it is reachable; the point of the GC is to detect "dead" objects and reclaim their memory). This means that objects can be accumulated in a special area (the "young generation"); the GC runs when that area is full, and scavenges the live objects, moving them ("physically") into the old generation. In most generational GC, this operation implies a pause ("stop-the-world") which is tolerable because it is short (the young generation is of limited size). The fact that the world is paused during a collection of the young generation allows for efficient handling of young objects (namely, reading or writing a reference in a young object fields is a mere memory access without needing to account for concurrent access from a GC thread or incremental mark&sweep).
A young generation, with a collection ran as I describe above, is efficient because when the young generation is collected, most of the objects in it are already dead, so they incur no extra cost. The optimal size of the young generation is a trade-off between the worst case (all young objects are live, which implies the maximum pause time) and the average efficiency (when the young generation is larger, more objects have time to die before the collection, which lowers the average cost of GC).
Running the GC manually is similar to making the young generation shorter. It means that more young objects will be promoted to the old generation, thus increasing the cost of the collection of the young generation (more objects must be scavenged) and the cost of the collection of the old generation (more old objects to handle).