tags:

views:

38

answers:

2

I have HTML that I need to extract a part number from, the HTML looks like:

javascript:selectItem('ABC123          1', '.....

I need to get the ABC123 from the above.

My code snippet:

Patterp p = Pattern.Compile("?????");
Matcher m = p.matcher(html);

if(m.find())
  partNumber = m.group(1).trim();

BTW, in the pattern, how do I escape for the character (

I now for quotes I do \"

thanks allot!

+1  A: 

You escape ( by putting a \ before it. Because it's in a String, you need to escape the \ so the sequence is \\(. This should parse that snippet:

Pattern p = Pattern.compile("javascript:selectItem\\('(\\w+)");
Matcher m = p.matcher(html);
if (m.find()) {
  String partNumber = m.group(1);
}

I've assumed the part number is one or more word characters (meaning digits, letters or underscore).

cletus
How would I match any character but the < character?
Blankman
@Blankman `[^<]` matches any character but `<`.
cletus
So then I would do ([^<]) to make it a capture group?
Blankman
@Blankman that would capture one character. You want `([^<]*)` to capture zero or more characters that aren't `<`. Since you expect at least one I would do `([^<]+)` to capture one or more.
cletus
yes ok that makes sense, thanks!
Blankman
A: 

You could use this:

Pattern regex = Pattern.compile("(?<=selectItem\\(')\\S*",Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
    ResultString = regexMatcher.group(1);
} 
Hun1Ahpu