Here's a top-down answer (most of the others are bottom-up):
Firefox is a XUL application (see also: XUL); XUL is a variant of XML used to describe a GUI that is interpreted by a renderer, much the same way that HTML is rendered within the browser, but XUL includes the browser's menus, buttons, status bar, keyboard shortcuts, etc. It's pretty neat; I've been able to put together some simple GUI apps much faster in XUL than in other frameworks (and it's platform-independent!).
If you look in the Firefox application directory (wherever you installed it on your system), you'll see a "chrome" directory with a bunch of .jar files. These are just .zip files with a particular structure (including a manifest) and you can look through them yourself.
Much of the Firefox browser is actually XUL + Javascript. It does utilize a lot of lower-level libraries written in C++ and accessible to Javascript via XPCOM, but if you want to understand (& modify) the higher-level behavior, the XUL+Javascript parts are probably the place to start.
edit: p.s. here are some tutorials/references for XUL: 1 2 3 and also the O'Reilly book
edit: XUL documents are very similar to HTML documents (only more so!) in the way they interact with the user + events. There's a document model for dynamically modifying the XUL, and there are event models that have event listeners. Both act just like HTML + DOM + its event model, but with a richer set of builtin objects/interfaces/events/etc. The event handlers are interfaces of a particular sort, and can be implemented by Javascript objects (declared in the XUL with onclick="blah()"
, or added dynamically through Javascript calls to addEventListener()
-- both are exactly the same syntax as HTML events in Firefox) or by C++ or other languages that can implement XPCOM objects with the proper interfaces.