Well, you first of all have to understand what you're up against.
Word-for-word plagiarism should be ridiculously easy to spot. The most naive approach would be to take word tuples of sufficient length and compare them against your corpus. The sufficient length can be incredibly low. Compare Google results:
"I think" => 454,000,000
"I think this" => 329,000,000
"I think this is" => 227,000,000
"I think this is plagiarism" => 5
So even with that approach you have a very high chance to find a good match or two (fun fact: most criminals are really dumb).
If the plagiarist used synonyms, changed word ordering and so on, obviously it gets a bit more difficult. You would have to store synonyms as well and try to normalise grammatical structure a bit to keep the same approach working. The same goes for spelling, of course (i.e. try to match by normalisation or try to account for the deviations in your matching, as in the NCD approaches posted in the other answers).
However the biggest problem is conceptual plagiarism. That is really hard and there are no obvious solutions without parsing the semantics of each sentence (i.e. sufficiently complex AI).
The truth is, though, that you only need to find SOME kind of match. You don't need to find an exact match in order to find a relevant text in your corpus. The final assessment should always be made by a human anyway, so it's okay if you find an inexact match.
Plagiarists are mostly stupid and lazy, so their copies will be stupid and lazy, too. Some put an incredible amount of effort into their work, but those works are often non-obvious plagiarism in the first place, so it's hard to track down programmatically (i.e. if a human has trouble recognising plagiarism with both texts presented side-by-side, a computer most likely will, too). For all the other 80%-or-so, the dumb approach is good enough.