tags:

views:

126

answers:

3

I have a simple xml file and I want to remove everything before the first <item> tag.

<sometag>
  <something>
   .....
  </something>
  <item>item1
  </item>
  ....
</sometag>

The following java code is not working:

String cleanxml = rawxml.replace("^[\\s\\S]+<item>", "");

What is the correct way to do this? And how do I address the non-greedy issue? Sorry I'm a C# programmer.

+1  A: 

use

replaceAll

or

replaceFirst

just replace will look for string matches HTH

ring bearer
It works. Thanks! But why the above regex is not working?
Yang
replace() does not accept a regular expression. It interprets its arguments as literal strings.
Sean Owen
+4  A: 

Well, if you want to use regex, then you can use replaceAll. This solution uses a reluctant quantifier and a backreference:

String cleanxml = rawxml.replaceAll(".*?(<item>.*)", "$1");

Alternately you can use replaceFirst. This solution uses a positive lookahead.

String cleanxml = rawxml.replaceFirst(".*?(?=<item>)", "");

It makes more sense to just use indexOf and substring, though.

String cleanxml = rawxml.substring(rawxml.indexOf("<item>"));

The reason why replace doesn't work is that neither char nor CharSequence overloads is regex-based. It's simple character (sequence) replacement.


Also, as others are warning you, unless you're doing processing of simple XMLs, you shouldn't use regex. You should use an actual XML parser instead.

polygenelubricants
+3  A: 

... What is the correct way to do this? ...

Since you asked about the correct way the correct way to do this is to parse the XML and remove the nodes and re-serialize to a String. You should never use regular expressions for manipulating XML or any other structured document that has parsers available ( JSON, YAML, etc).
For small XML I would suggest JDOM.

fuzzy lollipop