views:

343

answers:

6

Hello,

How can I have a text file (or XML file) represented as a whole string, and search for (or match) a particular string in it?

I have created a BufferedReader object:

BufferedReader input =  new BufferedReader(new FileReader(aFile));

and then I have tried to use the Scanner class with its option to specify different delimiters, like this:

//Scanner scantext = new Scanner(input);
//Scanner scantext = new Scanner(input).useDelimiter("");
Scanner scantext = new Scanner(input).useDelimiter("\n");
while (scantext.hasNext()) {  ... }

Using the Scanner class like this I can either read the text line by line, or word by word, but it doesn't help me, because sometimes in the text, which I want to process, I have

</review><review>

and I would like to say: if you find "<review>" anywhere in the text, do something with the following next lines (or piece of text) until you find "</review>". The problem is that <review> and </review> are on different places in the text, and sometimes glued to other text (therefore the empty space as delimiter doesn't help me).

I have thought that I might use the regular expression API in Java (the Pattern and Matcher classes), but they seem to match a particular string or line, and I want to have the text as one continuous string (at least this was my impressions from what I have read about them). Could you tell me what structures/methods/classes I should use in this case? Thank you.

+3  A: 

Don't try to parse XML with regular expressions; it leads only to pain. There are a lot of very nice existing XML APIs in Java already; why try to reinvent them?

Anyway, to search for a string in a text file, you should:

  1. Load the file as a string (example)
  2. Create a Pattern to search for
  3. Use a Matcher to iterate through any matches
Michael Myers
xom - http://www.xom.nu/ - my favorite
lucas
+1  A: 

It looks to me as though you are trying to work with a structured xml file, and would suggest that you look into javax.xml.parsers.DocumentBuilder or other built in APIs to parse the document.

Rich Kroll
+1  A: 

Use an XML parser.

Or use xpath, like in this example.

A_M
+1  A: 

I have thought that I might use the regular expression API in Java (the Pattern and Matcher classes), but they seem to match a particular string or line, and I want to have the text as one continuous string

Um, does something prevent you from reading the XML file into a String, and then operating on that, using the regular expression API?

You can easily read a file into a String using e.g. FileUtils from Apache Commons IO: see readFileToString(File file, String encoding).

Jonik
+1  A: 

I also would recommend using a XML parsing API...But as you only want to do something in case of "review" tag, maybe you could use SAX better than DOM...

Valentin Rocher
A: 

I think here, we can copy individual line in the text file into a string and then try to match a substring(search string) with the string(line)

But error produces while excuting metacharacters like / or # etc..