views:

124

answers:

4

I'm in a Data Structures class (in Java) this semester, but we're doing a lot of parsing on text files to populate the structures we design. The focus is on the structures themselves, not on parsing algorithms. I feel sort of weak in the area and was wondering if anyone could point me to a book or site on the subject. Design patterns, libraries, styles, etc. Thanks!

+1  A: 

Check out ANTLR

Todd Stout
that seems to advanced for his level ...
Toader Mihai Claudiu
+1  A: 

You can do basic text parsing with the StringTokenizer class, the String.split() methods, and the Pattern and Matcher classes for regular expressions.

Vanessa MacDougal
+2  A: 

The book "Design Patterns" describes the structure of a recursive-descent parser.

The javacc compiler-compiler can be used to generate parsers in Java.

Steve Emmerson
This is great for more advanced parsing such as defining a simple grammar. We used it in my firm to define a SQL-like domain-specific query language.
Adamski
@Adamski: Indeed, I've used javacc several times. As an old yacc(1) user, I was impressed by its simplicity and power.
Steve Emmerson
+3  A: 

For parsing basic text files in Java, I would start by examining the Scanner class:

For any Text parsing, a basic knowledge of Regex is a good thing to have:

If Scanner isn't doing the job, you can always parse through a text file line-by-line with a BufferedReader backed by a FileReader.

BufferedReader reader = new BufferedReader(new FileReader("/path/to/file.txt"));
for (String line = reader.readLine(); line != null; line = reader.readLine())
{
    //process your line here
}

Scanner may again be useful here, and you could also look into String.split(), or the java Pattern API.

Files can be in many formats however. For advice on the best way to parse a file of a file in a given well-defined format, google will be your friend. Or you can always post a more specific quesiton here with the format that is giving you trouble.

Jon Quarfoth