views:

76

answers:

3

For my data structures class, the first project requires a text file of songs to be parsed.

An example of input is:
ARTIST="unknown"
TITLE="Rockabye Baby"
LYRICS="Rockabye baby in the treetops
When the wind blows your cradle will rock
When the bow breaks your cradle will fall
Down will come baby cradle and all
"

I'm wondering the best way to extract the Artist, Title and Lyrics to their respective string fields in a Song class. My first reaction was to use a Scanner, take in the first character, and based on the letter, use skip() to advance the required characters and read the text between the quotation marks.

If I use this, I'm losing out on buffering the input. The full song text file has over 422K lines of text. Can the Scanner handle this even without buffering?

+3  A: 

For something like this, you should probably just use Regular Expressions. The Matcher class supports buffered input.

The find method takes an offset, so you can just parse them at each offset.

http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html

Regex is a whole world into itself. If you've never used them before, start here http://download.oracle.com/javase/tutorial/essential/regex/ and be prepared. The effort is so very worth the time required.

Joshua Smith
Regular Expression is the solution.
mohammad shamsi
Jason
Since this is just for a data structure course and the input is quite simple, regex is probably overkill.
MAK
I'm marking this up as the accepted answer because you gave the best solution, even though it is overkill and more than I really need to use.
Jason
+1  A: 

If the source data can be parsed using one token look ahead, StreamTokenizer may be a choice. Here is an example that compares StreamTokenizer and Scanner.

trashgod
+1  A: 

In this case, you could use a CSV reader, with the field separator '=' and the field delimiter '"' (double quote). It's not perfect, as you get one row for ARTIST, TITLE, and LYRICS.

Thomas Mueller
This would also solve problems with escape characters (double quotes within the LYRICS. There are other CSV reader tools by the way, I just linked the one I know (and wrote myself).
Thomas Mueller