views:

79

answers:

4

I'm trying to parse a section of a large file with Java's Scanner library, but I'm having a hard time trying to determine the best route to parse this text.

SECTOR 199
FLAGS 0x1000
AMBIENT LIGHT 0.67
EXTRA LIGHT 0.00
COLORMAP 0
TINT 0.00 0.00 0.00
BOUNDBOX 7.399998 8.200002 6.199998 9.399998 8.500000 7.099998
COLLIDEBOX 7.605121 8.230770 6.200000 9.399994 8.469233 7.007693
CENTER 8.399998 8.350001 6.649998
RADIUS 1.106797
VERTICES 12
0: 1810
1: 1976
2: 1977
3: 1812
4: 1978
5: 1979
6: 1820
7: 1980
8: 1821
9: 1981
10: 1982
11: 1811
SURFACES 1893 8

It has some optional fields that(SOUND, COLLIDEBOX), so I can't parse in a particular order like I've been doing with the previous part of the file. I'm unsure how to go about doing this without making it terribly inefficient, at the moment I've been thinking about parsing each line, then splitting it with the String.split("\s+") to get the values, but I'm curious what other options I may have. :\

+2  A: 

The input looks like it is complex enough to warrent an full blown parser. I would recommend to use a library such as ANTLR ( http://www.antlr.org/ ).

Arne
May have to take that route, though I don't know if I want to have to "re-write" my code. :\ Already invested a lot of time in it, but thanks for the suggestion. :3
Unrealomega
+1  A: 

I'd first define an enum with the keywords, like:

 public enum Keyword {SECTOR, FLAGS, AMBIENT, EXTRA, COLORMAP, TINT, 
    BOUNDBOX, COLLIDEBOX, CENTER, RADIUS, VERTICES, SURFACES}

Parsing can be done line by line, splitting at whitespace chars. Then I'd convert the first element to an enum from the Keyword class and use a simple switch construct to handle the values:

public Model parse(List<String> lines) {

   Model model = new Model();

   Iterator<String> it = lines.iterator();
   while(it.hasNext()) {
      String[] elements = it.next().split("\s+");

      switch(Keyword.valueOf(elements[0])) {
        case SECTOR: model.addSector(elements[1]); break;
        case FLAGS: model.addFlags(elements[1]); break;
        // ...
        case VERTICES:
          int numberOfVertices = Integer.parseInt(elements[1]);
          for (int i = 0; i < numberOfVertices; i++) {
             elements = it.next().split("\s+");
             model.addVertice(i, elements[1]);
          }
          break;
        case default:
          // handle malformed line

      }
   }
   return model;
}
Andreas_D
I like the look of this one. Clean, easy, and already checks for malformed files. I may use this for now, for testing purposes.
Unrealomega
+1  A: 

How about this approach:

find next command (SECTOR, FLAGS, AMBIENT LIGHT, EXTRA LIGHT, etc)
no command found? -> output error and stop
map to command implementation 
execute command (pass it the scanner and your state holder)
command impl handles specific reading of arguments
rinse, repeat,...

You will have to create a Command interface:

public interface Command {
    String getName();
    void execute(Scanner in, ReadState state);
}

and a separate implementation of it for each type of command you can encounter:

public class SectorCommand implements Command {
    public String getName() {
        return "SECTOR";
    }
    public void execute(Scanner in, ReadState state) {
        state.setSector(in.nextInt());
    }
}

and of some sort of factory to find commands:

public class CommandFactory {

    private Map<String, Command> commands;
    public CommandFactory() {
        commands = new HashMap<String, Command>();
        addCommand(new SectorCommand());
        // add other commands
    }
    public Command findCommand(Scanner in) {
        for (Map.Entry<String, Command> entry : commands.entrySet()) {
            if (in.findInLine(entry.getKey())) {
                return commands.get(entry.getValue);
            }
        }
        throw new IllegalArgumentException("No command found");
    }
    private void addCommand(Command command) {
        commands.put(command.getName(), command); 
    }
}

(this code may not compile)

Adriaan Koster
A: 

If the file is very big,I suggest that you can use java.io.RandomAccessFile,it can skip any area that you want to parse and it's very fast. If you map whole file into memnory, it may slow down you application.

It's alternative to use java.util.StringTokenizer to split simple case.For example, white space,comma and so on. It's more faster than regular expression.

Mercy