If you are just interested in getting to grips with the basics of Hadoop, ie how to access the HDFS, running basic MapReduce tasks etc then you can do without a cluster or even multiple VMs really.
Hadoop is able to run in three modes:
- Fully-distributed
- Pseudo-distributed
- Non-distributed (Local)
For the purposes of learning you can start with non-distributed mode which runs on a single machine. Everything runs inside a single JVM and none of the hadoop demons run. This is the simplest mode to get running, but still allows you to use MapReduce etc. You can get this up and running in a few minutes really, once you have the latest package downloaded.
Pseudo-dist is the next level up from non-dist. It still runs on a single machine, but simulates the operations of a cluster more accurately. The hadoop demons run in this mode and multiple JVMs are created simulating nodes in a cluster.
Fully-distributed is the mode a full-blown cluster uses.