I have done this many times and continue to do this. In this case where your primary goal is reading and not writing assembler I feel this applies.
Write your own disassembler. Not for the purpose of making the next greatest disassembler, this one is strictly for you. The goal is to learn the instruction set. Whether I am learning assembler on a new platform, remembering assembler for a platform I once knew.
Start with only a few lines of code, adding registers for example, and ping pong-ing between disassembling the binary output and adding more and more complicated instructions on the input side you:
1) learn the instruction set for the specific processor
2) learn the nuances of how to write code in assemble for said processor such that you can wiggle every opcode bit in every instruction
3) you learn the instruction set better that most engineers that use that instruction set to make their living
In your case there are a couple of problems, I normally recommend the ARM instruction set to start with, there are more ARM based products shipped today than any other (x86 computers included). But the likelihood that you are using ARM now and dont know enough assembler for it to write startup code or other routines knowing ARM may or may not help what you are trying to do. The second and more important reason for ARM first is because the instruction lengths are fixed size and aligned. Disassembling variable length instructions like x86 can be a nightmare as your first project, and the goal here is to learn the instruction set not to create a research project. Third ARM is a well done instruction set, registers are created equal and dont have individual special nuances.
So you will have to figure out what processor you want to start with. I suggest the msp430 or ARM first, then ARM first or second then the chaos of x86. No matter what platform, any platform worth using has data sheets or programmers reference manuals free from the vendor that include the instruction set as well as the encoding of the opcodes (the bits and bytes of the machine language). For the purpose of learning what the compiler does and how to write code that compiler doesnt have to struggle with it is good to know a few instruction sets and see how the same high level code is implemented on each instruction set with each compiler with each optimization setting. You dont want to get into optimizing your code only to find that you have made it better for one compiler/platform but much worse for every other.
Oh for disassembling variable length instruction sets, instead of simply starting at the beginning and disassembling every four byte word linearly through memory as you would with the ARM or every two bytes like the msp430 (The msp430 has variable length instructions but you can still get by going linearly through memory if you start at the entry points from the interrupt vector table). For variable length you want to find an entry point based on a vector table or knowledge about how the processor boots and follow the code in execution order. You have to decode each instruction completely to know how many bytes are used then if the instruction is not an unconditional branch assume the next byte after that instruction is another instruction. You have to store all possible branch addresses as well and assume those are the starting byte addresses for more instructions. The one time I was successful I made several passes through the binary. Starting at the entry point I marked that byte as the start of an instruction then decoded linearly through memory until hitting an unconditional branch. All branch targets were tagged as starting addresses of an instruction. I made multiple passes through the binary until I had found no new branch targets. If at any time you find say a 3 byte instruction but for some reason you have tagged the second byte as the beginning of an instruction you have a problem. If the code was generated by a high level compiler this shouldnt happen unless the compiler is doing something evil, if the code has hand written assembler (like say an old arcade game) it is quite possible that there will be conditional branches that can never happen like r0=0 followed by a jump if not zero. You may have to hand edit those out of the binary to continue. For your immediate goals which I assume will be on x86 I dont think you will have a problem.
I recommend the gcc tools, mingw32 is an easy way to use gcc tools on Windows if x86 is your target. If not mingw32 plus msys is an excellent platform for generating a cross compiler from binutils and gcc sources (generally pretty easy). mingw32 has some advantages over cygwin, like significantly faster programs and you avoid the cygwin dll hell. gcc and binutils will allow you to write in C or assembler and disassemble your code and there are more web pages than you can read showing you how to do any one or all of the three. If you are going to be doing this with a variable length instruction set I highly recommend you use a tool set that includes a disassembler. A third party disassembler for x86 for example is going to be a challenge to use as you never really know if it has disassembled correctly. Some of this is operating system dependent too, the goal is to compile the modules to a binary format that contains information marking instructions from data so the disassembler can do a more accurate job. Your other choice for this primary goal is to have a tool that can compile directly to assembler for your inspection then hope that when it compiles to a binary format it creates the same instructions.
The short (okay slightly shortER ) answer to your question. Write a disassembler to learn an instruction set. I would start with something RISCy and easy to learn like ARM. Once you know one instruction set others become much easier to pick up, often in a few hours, by the third instruction set you can start writing code almost immediately using the datasheet/reference manual for the syntax. All processors worth using have a datasheet or reference manual that describes the instructions down to the bits and bytes of the opcodes. Learn a RISC processor like ARM and a CISC like x86 enough to get a feel for the differences, things like having to go through registers for everything or being able to perform operations directly on memory with fewer or no registers. Three operand instructions versus two, etc. As you tune your high level code, compile for more than one processor and compare the output. The most important thing you will learn is that no matter how good the high level code is written the quality of the compiler and the optimization choices made make a huge difference in the actual instructions. I recommend llvm and gcc (with binutils), neither produce great code, but they are multi platform and multi target and both have optimizers. And both are free and you can easily build cross compilers from sources for various target processors.