You could use a "brute force" approach, whereby you compare the generated language to data collected on n-grams of higher order than the Markov model that generated it.
i.e. If the language was generated with a 2nd order Markov model, up to 3-grams are going to have the correct frequencies, but 4-grams probably won't.
You can get up to 5-gram frequencies from Google's public n-gram dataset. It's huge though - 24G compressed - you need to get it by post on DVDs from LDC.
EDIT: Added some implementation details
The n-grams have already been counted, so you just need to store the counts (or frequencies) in a way that's quick to search. A properly indexed database, or perhaps a Lucene index should work.
Given a piece of text, scan across it and look up the frequency of each 5-gram in your database, and see where it ranks compared to other 5-grams that start with the same 4 words.
Practically, a bigger obstacle might be the licensing terms of the dataset. Using it for a commercial app might be prohibited.