ansaurus

Question

Parsing plain text to some structured object

Answer 1

A:

You can use an Interpreter and a Builder.

The Interpreter parses the source and identifies keys and values, which are passed to the Builder, which constructs any data structure you desire.

2010-04-26 12:26:56

Answer 2

+2 A:

If you are familiar with formal languages, tokenization/grammars etc., you could use a parser generator like, JavaCC. JavaCC takes the grammar file that you write and generates java code that parses the text file into a series of tokens, or a sytax tree. There are plugins for Maven and Ant that can help integrate this additional source into your build.

For a runtime-only solution, there is RunCC, which I've used with good results. (I suspect it is not as fast as JavaCC, but for my case the performance was fine.)

There is also Chaperon, which converts plain text to XML, using a grammar file.

An alternative to these is to use an ad hoc mix of regex and StringTokenizer.

With a parser project or regex armed and ready, your general approach is then like this:

write a grammar for your plain text file. Some details are missing about the your plain text format, but you may simply be able to use a BufferedReader.readLine() to read lines of the file, and StringTokenizer to split the line into substrings at spaces and commas.
The strings you get form the parser, the first string you use as the key, and the subsequent strings are values, that you add to a Map. E.g. in pseudocode

Map> map = new HashMap>(); for each line { List tokens = ...; // result of splitting the line String key = tokens.get(0); map.add(key, tokens.sublist(1, tokens.size()); }

Even if the parser doesn't filter uninteresting text, it will be filtered later.
Build a parser with the above projects to parse the map file format. Again, you may be able to build a simple parser with regexes and StringTokenizer. Use the parser to build a map. The map has the same signature as above, i.e Map<String,List<String>>.
Finally, filter the input map against the allowed values map.

Something like this.

   Map<String,List<String>> input = ...; // from step 1.
   Map<String,List<String>> allowed = ...; // from step 3.
   Map<String,List<String>> result = new HashMap<String<list<String>>(); // the final map
   for (String key : input.keySet()) {
      if (allowd.contains(key)) {
         List<String> outputValues = new ArrayList();
         List<String> allowedValues = allowed.get(key);
         List<String> inputValues = input.get(key);
         for (String value: inputValues) {
            if (allowedValues.contains(value))
                outputValues.add(value);
         }
         if (!outputValues.isEmpty())
            output.put(key, outputValues);
      }
   }
   // final result in filter

mdma 2010-04-29 00:31:52

ansaurus

tags:

views:

answers:

Parsing plain text to some structured object

related questions