I have some friends making a text-based game in Java (what the hell?), and they're looking for the best way to parse strings for commands. They've come across many methods and are wondering what would be the best way to go about things.

+8  A: 

I really like regular expressions. As long as the command strings are fairly simple, you can write a few regexes that could take a few pages of code to manually parse.

I would suggest you check out for a good intro to regexes, as well as specific examples for Java.

Daniel Broekman

A simple string tokenizer on spaces should work, but there are really many ways you could do this.

Here is an example using a tokenizer:

String command = "kick person";
StringTokenizer tokens = new StringTokenizer(command);
String action = null;

if (tokens.hasMoreTokens()) {
action = tokens.nextToken();

if (action != null) {
doCommand(action, tokens);

Then tokens can be further used for the arguments. This all assumes no spaces are used in the arguments... so you might want to roll your own simple parsing mechanism (like getting the first whitespace and using text before as the action, or using a regular expression if you don't mind the speed hit), just abstract it out so it can be used anywhere.

Mike Stone
+1  A: 

I would look at Java migrations of Zork, and lean towards a simple Natural Language Processor (driven either by tokenizing or regex) such as the following (from this link):

    public static boolean simpleNLP( String inputline, String keywords[])
        int i;
        int maxToken = keywords.length;
        int to,from;
        if( inputline.length() < 1) return false;

        Vector lexed  = new Vector();  // stores the words
        // first extract every substring in inputline that has a blank on either side.

        from = 0;
        to = 0;
        while( inputline.charAt(from) == ' ' && from < inputline.length() ) from ++;  // skip ' '
        if( from >= inputline.length()) return false; // check for blank and empty lines
        while( to >=0 )
            to = inputline.indexOf(' ',from);
            if( to > 0){
                from = to;
                while( inputline.charAt(from) == ' '
                && from < inputline.length()-1 ) from ++;
                lexed.addElement( inputline.substring(from));
        // if we get here we have a vector of strings that correspond to the words in the input.
        // so now we look for matches in order
        boolean status =false;
        to = 0;
        for( i=0; i< lexed.size(); i++)
            String s = (String)lexed.elementAt(i);
            if( s.equalsIgnoreCase( keywords[to]) )
                if( to >= keywords.length) { status = true; break;}
        return status;


Anything which gives a programmer a reason to look at Zork again is good in my book, just watch out for Grues.


James D

@CodingTheWheel: stick four spaces at the start of every line of code. Then the individual indentation spaces for lines that are indented. It'll even syntax highlight! (Click on markdown editing help for more info).

Matthew Schinckel
+1  A: 

I assume you're trying to make the command interface as forgiving as possible. If this is the case, I suggest you use an algorithm similar to this:

  1. Read in the string
  2. Split the string into tokens
  3. Use a dictionary to convert synonyms to a common form
    • For example, convert "hit", "punch", "strike", and "kick" all to "hit"
  4. Perform actions on an unordered, inclusive base
    • Unordered - "punch the monkey in the face" is the same thing as "the face in the monkey punch"
    • Inclusive - If the command is supposed to be "punch the monkey in the face" and they supply "punch monkey", you should check how many commands this matches. If only one command, do this action. It might even be a good idea to have command priorities, and even if there were even matches, it would perform the top action.

@CodingTheWheel Heres your code, a bit clean up and through eclipse (ctrl+shift+f) and the inserted back here :)

Including the four spaces in front each line.

public static boolean simpleNLP(String inputline, String keywords[]) {
if (inputline.length() < 1)
return false;

List<String> lexed = new ArrayList<String>();
for (String ele : inputline.split(" ")) {

boolean status = false;
to = 0;
for (i = 0; i < lexed.size(); i++) {
String s = (String) lexed.get(i);
if (s.equalsIgnoreCase(keywords[to])) {
if (to >= keywords.length) {
status = true;
return status;

When the separator String for the command is allways the same String or char (like the ";") y recomend you use the StrinkTokenizer class:


but when the separator varies or is complex y recomend you to use the regular expresions, wich can be used by the String class itself, method split, since 1.4. It uses the Pattern class from the java.util.regex package


+2  A: 

Sun itself recommends staying away from StringTokenizer and using the String.spilt method instead.

You'll also want to look at the Pattern class.

+6  A: 

Parsing manually is a lot of fun... at the beginning:)

In practice if commands aren't very sophisticated you can treat them the same way as those used in command line interpreters. There's a list of libraries that you can use: I think you can start with apache commons CLI or args4j (uses annotations). They are well documented and really simple in use. They handle parsing automatically and the only thing you need to do is to read particular fields in an object.

If you have more sophisticated commands, then maybe creating a formal grammar would be a better idea. There is a very good library with graphical editor, debugger and interpreter for grammars. It's called ANTLR (and the editor ANTLRWorks) and it's free:) There are also some example grammars and tutorials.

+1  A: 

If this is to parse command lines I would suggest using Commons Cli.

The Apache Commons CLI library provides an API for processing command line interfaces.

+3  A: 

Another vote for ANTLR/ANTLRWorks. If you create two versions of the file, one with the Java code for actually executing the commands, and one without (with just the grammar), then you have an executable specification of the language, which is great for testing, a boon for documentation, and a big timesaver if you ever decide to port it.

John the Statistician
+1  A: 

Try JavaCC a parser generator for Java.

It has a lot of features for interpreting languages, and it's well supported on Eclipse.


If the language is dead simple like just


then splitting by hand works well.

If it's more complex, you should really look into a tool like ANTLR or JavaCC.

I've got a tutorial on ANTLR (v2) at which will give you an idea of how it works.

Scott Stanchfield

JCommander seems quite good, although I have yet to test it.

Pierre Gardin