tags:

views:

59

answers:

4

I am trying to a regular expression which extracs the data from a string like

<B Att="text">Test</B><C>Test1</C>

The extracted output needs to be Test and Test1. This is what I have done till now:

public class HelloWorld {
    public static void main(String[] args)
    {
        String s = "<B>Test</B>";
        String reg = "<.*?>(.*)<\\/.*?>";
        Pattern p = Pattern.compile(reg);
        Matcher m = p.matcher(s);
        while(m.find())
        {
            String s1 = m.group();
            System.out.println(s1);
        }
    }
}

But this is producing the result <B>Test</B>. Can anybody point out what I am doing wrong?

+2  A: 

Three problems:

  • Your test string is incorrect.
  • You need a non-greedy modifier in the group.
  • You need to specify which group you want (group 1).

Try this:

String s = "<B Att=\"text\">Test</B><C>Test1</C>"; // <-- Fix 1
String reg = "<.*?>(.*?)</.*?>";                   // <-- Fix 2
// ...
String s1 = m.group(1);                            // <-- Fix 3

You also don't need to escape a forward slash, so I removed that.

See it running on ideone.

(Also, don't use regular expressions to parse HTML - use an HTML parser.)

Mark Byers
Thanks..but this produces the output `<B Att="text">Test</B>` for first iteration and `<C>Test1</C>`during second iteration. But I want only `Test` and `Test1` as output.
Asha
@Asha: String s1 = m.group(**1**);
Mark Byers
Working fine now..I had tried it before but gave the index as 0. Didn't realize it is starting from 1.
Asha
@Asha: Group 0 means the entire match.
Mark Byers
+1  A: 

It almost looks like you're trying to use regex on XML and/or HTML. I'd suggest not using regex and instead creating a parser or lexer to handle this type of arrangement.

wheaties
+1  A: 

I think the bestway to handle and get value of XML nodes is just treating it as an XML.

If you really want to stick to regex try:

<B[^>]*>(.+?)</B\s*>

understanding that you will get always the value of B tag.

Or if you want the value of any tag you will be using something like:

<.*?>(.*?)</.*?>
Garis Suero
+2  A: 

If u are using eclipse there is nice plugin that will help you check your regular expression without writing any class to check it. Here is link: http://regex-util.sourceforge.net/update/ You will need to show view by choosing Window -> Show View -> Other, and than Regex Util

I hope it will help you fighting with regular expressions

Marek