It's all very modular and flexible; however this leads to complexity.
The "X Server" drives the display device. It provides graphics services to clients, and those services are pretty simple - such as:
"Give me a window frame to draw in"
"Put this bitmap here"
"Draw a horizontal black line 100px wide"
"Render the text 'hello' at (100,100)"
"Tell me if any mouse clicks or key presses have been aimed at my window frame"
There is a library called Xlib, provided by X, that has a standard interface for all these simple services. Any program that wants to use the X server's display eventually uses this client library and is called an X Client. Xlib knows how to connect to an arbitrary X server - on the local machine, or via TCP/IP across the LAN, or across the world - to call these services.
The Window Manager, which is just another X client program, is in charge of the "look and feel" of the desktop - how you move and arrange windows, etc. Because the window manager draws all the window decorations, it can make the desktop look like WindowsXP, or a Mac, or NeXTSTEP.
Part of the philosophy of X was to define "mechanism and not policy" - meaning, they give you tools to do it, but don't tell you how to use those tools. One such tool is the window manager, which can be replaced at will.
Many modern X applications are written to use a desktop enviroment such as Gnome or KDE. This offers these programs a consistent set of buttons and controls to draw, and a consistent interface for some things not traditionally included in X, but often considered part of a desktop - such as how to respond to drag-and-drop or how to present a standard file chooser dialog box.
The desktop environment usually provides an object model or programmatic interface that takes care of making all the simple X client requests and lets the program handle more important things. Removing these low-level calls yields another important benefit - platform independence.
Many desktop environments include a window manager, so that the look and feel of window controls and buttons is consistent and works with the desktop metaphor provided by the environment. However, it can usually still be switched out.
The separation of the X Server (running the display) and the X Client (wanting to use the display) has a few implications:
The graphics system is separate from the GUI programs, and they are separated about as completely as a web browser and web server are.
So the GUI program might not be displaying on the local machine - just like a web browser doesn't have to point at a web server on the local machine.
A machine can run JUST the client, with the X server elsewhere.
The machine with the display doesn't have to run the client - it can run JUST the X server, and all the clients can run on a dedicated machine. This is the original thin client: big programs running on big central server - with graphical user interaction handled by dedicated hardware on the desk in front of the user.
You need to know what your X server's network address is so you can tell GUI programs where to display their GUI. (this is usually done by setting the DISPLAY environment variable)
You can display many programs, from many different machines, all on the same desktop at the same time. It is all handled seamlessly, including cut-and-paste.