building a gcc cross compiler yourself is pretty easy. the gcc library and the C library and other things not so much, an embedded library and such a little harder. Depends on how embedded you want to get. I have little use for gcclib or a c library so roll your own works great for me.
After many years of doing this, perhaps it is an age thing, I now just go get the code sourcery tools. the lite version works great. yagarto, devkitarm, winarm or something like that (the site with a zillion examples) all work fine. emdebian also has a good pre-built toolchain. a number of these places if not all have info on how they built their toolchains from gnu sources.
You asked about gcc, but bear in mind that llvm is a strong competitor, and as far as cross compiling goes, since it always cross compiles, it is a far easier cross compiler to download and build and get working than gcc. the recent version is now producing code (for arm) that competes with gcc for performance. gcc is in no way a leader in performance, other compilers I have used run circles around it, but it has been improving with each release (well the 3.x versions sometimes produce better code than the 4.x versions, but you need 4.x for the newer cores and thumb2). even if you go with gcc, try the stable release of llvm from time to time.
qemu is a good emulator, depending on what you are doing the gba emulator virtual gameboy advance is good. There are a couple of nds emulators too. GDB and other places have what appear to be ARMs own armulator. I found it hard to extract and use, so I wrote my own, but being lazy only implemented the thumb instruction set, I called mine the thumbulator. easy to use. Far easier than qemu and armulator to add peripherals to and watch and debug your code. ymmv.
Hmmm I posted a similar answer for someone recently. Google: arm verilog and at umich you will find a file isc.tgz in which is an arm10 behavioural (as in you cannot make a chip from it therefore you can find verilog on the net) model. Which for someone wanting to learn an instruction set, watching your code execute at the gate level is about as good as it gets. Be careful, like a drug, you can get addicted then have a hard time when you go back to silicon where you have relatively zero visibility into your code while it is executing. Somewhere in stackoverflow I posted the steps involved to get that arm10 model and another file or two to turn it into an arm emulator using icarus verilog. gtkwave is a good and free tool for examining the wave (vcd) files.
Above all else you will need the ARM ARM. (The ARM Architectural Reference Manual). Just google it and find it on ARM's web site. There is pseudo code for each instruction teaching you what they do. Use the thumbulator or armulator or others if you need to understand more (mame has an arm core in it too). I make no guarantees that the thumbulator is 100% debugged or accurate, I took some common programs and compared their output to silicon both arm and non-arm to debug the core.