views:

26

answers:

2

I have text in the following format:

section name 1:

this text goes into the first section

section name 2:

this text goes into the second section

etc,

Where section names are arbitrary phrases and section contents will contain free text except section name. I need to split this text into object pairs of type (section name, section text).

Is there an effective RegEx or other recommended way of doing this?

Thanks. -Raj

A: 

You'll need a structure or a fixed, identifiable delimiter to decide whether a line contains a section name or a section body.

If you have a rule saying: a text line terminated with a colon is a section name, then you should read the document line by line, look for the last char in a line and treat the line (1) as a section head, if its last char is a colon or (2) as partof a section body otherwise.

Andreas_D
A: 

Well it depends on the structure of your document. For example, does each section have an empty line? If so, then it will be easy by just scanning line by line and just construct your object that way.

List<Section> sections = new ArrayList<Section>();
String temp = null;
String line = null;
int lineNumber = 0;

while ((line = br.readLine()) != null) {
  lineNumber++;
  if (lineNumber % 2 == 0) {
    // Section Text
    sections.add(new Section(temp, line);
  }
  else {
    // Section Name
    temp = line;
  }
}

Then your Section might be:

public class Section {
  private final String name;
  private final String text;
  public Section(String name, String text) {
    this.name = name;
    this.text = text;
  }
}
Mohamed Mansour