views:

164

answers:

4

How would you parse in Java a structure, similar to this


\\Header (name)\\\
1JohnRide  2MarySwanson
 1 password1
 2 password2
\\\1 block of data name\\\
  1.ABCD
  2.FEGH
  3.ZEY
\\\2-nd block of data name\\\
1. 123232aDDF dkfjd ksksd
2. dfdfsf dkfjd
....
etc

Suppose, it comes from a text buffer (plain file).

Each line of text is "\n" - limited. Space is used between the words.

The structure is more or less defined. Ambuguity may sometimes be, though, case number of fields in each line of information may be different, sometimes there may not be some block of data, and the number of lines in each block may vary as well.

The question is how to do it most effectively?

First solution that comes to my head is to use regular expressions.

But are there other solutions? Problem-oriented? Maybe some java library already written?

A: 

If the fields are fixed length, you could use a DataInputStream to read your file. Or, since your format is line-based, you could use a BufferedReader to read lines and write yourself a state machine which knows what kind of line to expect next, given what it's already seen. Once you have each line as a string, then you just need to split the data appropriately.

E.g., the password can be gotten from your password line like this:

final int pos = line.indexOf(' ');
String passwd = line.substring(pos+1, line.length());
uckelman
Yes there's no problem in reading every line. But then we need to extract the meaningful information from those lines in order to save it in the Object.
EugeneP
You need to use your knowledge of the structure of the lines to extract data from it. I gave you an example for your password field.
uckelman
A: 

From what you have posted it looks like the data is delimited by whitespace. One idea is to use a Scanner or a StringTokenizer to get one token at a time. You can then check the first char of a token to see if it is a digit (in which case the part of the token after the digit(s) will be the data, if there is any).

MAK
A: 

This sounds like a homework problem so I'm going to try to answer it in such a way to help guide you (not give the final solution).

First, you need to consider each object of data you're reading. Is it a number then a text field? A number then 3 text fields? Variable numbers and text fields?

After that you need to determine what you're going to use to delimit each field and each object. For example, in many files you'll see something like a semi-colon between the fields and a new line for the end of the object. From what you said it sounds like yours is different.

If an object can go across multiple lines you'll need to bear that in mind (don't stop partway through an object).

Hopefully that helps. If you research this and you're still having problems post the code you've got so far and some sample data and I'll help you to solve your problems (I'll teach you to fish....not give you fish :-) ).

SOA Nerd
EugeneP
Libraries like that: first we define the structure of the input in the xml file, than the function parses the input file accordingly to the .xml and tries to extract information
EugeneP
A: 

As no one recommended any library, my suggestion would be : use REGEX.

EugeneP