ansaurus

Question

Answer 1

+1 A:

I think each call to find is advancing through your match. Calling m1.find() inside your condition is moving your matcher to a place where there is no longer a valid match, which causes m1.start() to throw (I'm guessing) an IllegalStateException Ensuring you call find once per iteration and referencing that result from some flag avoids this problem.

boolean m1Matched = m1.find()
boolean m2Matched = m2.find()
while( m1Matched || m2Matched ) {

            if( m1Matched ){
                ...
            }

m1Matched = m1.find();
m2Matched = m2.find();
}

butterchicken 2009-07-03 09:51:38

thnx, i will look into that :)

doro 2009-07-03 10:02:32

Answer 2

+3 A:

I know that I am broadening your question, but I think that using a dedicated library for parsing HTML documents (such as: http://htmlparser.sourceforge.net/) will be much more easier and accurate than regexps.

Itay 2009-07-03 09:52:47

i bet there are some really cool solutions that would take away some away as well, but i am supposed to do that from the scratch ... thnx, i will look into it anyway ;)

doro 2009-07-03 10:03:24

Answer 3

+1 A:

Here is an example for what you're trying to do adapted from one of my notes:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {

        String tag = "thetag";
        String id = "foo";

        String content = "<tag1>\n"+
                "<thetag name=\"Tag Name\" id=\"foo\">Some text</thetag>\n" +
                "<thetag name=\"AnotherTag\" id=\"foo\">Some more text</thetag>\n" +
                "</tag1>";

        String patternString = "<" + tag + ".*?name=\"(.*?)\".*?id=\"" + id + "\".*?>";

        System.out.println("Content:\n" + content);
        System.out.println("Pattern: " + patternString);

        Pattern pattern = Pattern.compile(patternString);

        Matcher matcher = pattern.matcher(content);

        boolean found = false;
        while (matcher.find()) {
            System.out.format("I found the text \"%s\" starting at " +
                    "index %d and ending at index %d.%n",
                    matcher.group(), matcher.start(), matcher.end());
            System.out.println("Name: " + matcher.group(1));
            found = true;
        }
        if (!found) {
            System.out.println("No match found.");
        }
    }
}

You'll notice that the pattern string becomes something like <thetag.*?name="(.*?)".*?id="foo".*?> which will search for tags named thetag where the id attribute is set to "foo".

Note the following:

It uses .*? to weakly match zero or more of anything (if you don't understand, try removing the ? to see what I mean).
It uses a submatch expression between parenthesis (the name="(.*?)" part) to extract the contents of the name attribute (as an example).

iWerner 2009-07-03 10:13:33

thnx for the code :) awesome

doro 2009-07-03 10:23:00

Answer 4

A:

Awesome example. Really helpful :) :)

2009-09-08 11:04:09

ansaurus

tags:

views:

answers:

pattern match java: does not work

related questions