The X protocol's text-rendering facilities do not support anti-aliasing and aren't used much these days. (I think the reason is that the X font protocol doesn't have any place for an alpha channel.)
GTK and Qt render text in the client using the FreeType library, getting a pixmap with an alpha channel as the result. If the X server supports the RENDER extension, the client can send that pixmap to the server to have it blended onto the display using its alpha channel. If the X server doesn't support RENDER, the client has to retrieve the region of the screen where the text is to be displayed (taking a small screenshot, basically), do the alpha blending client-side, and send the resulting opaque pixmap back to the X server to be displayed.