views:

60

answers:

2

Hello,

I want to parse a string of 12 (often) different floats in a row (with some irrelevant text in front, marking a field), and want them all to end up in a capturing group of their own, so as to collect them from the matcher one at a time. I've noticed I am succeeding by writing the following:

Pattern lastYearIncomePattern = Pattern.compile("(.+\\{\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)([0-9]{1,2}\\.[0-9]{3}\\s)");

which is an exhausting mass of duplicated code. The part ([0-9]{1,2}\\.[0-9]{3}\\s) will occur 12 times.

Is there any way of doing this better? I've looked on defining that string for itself and then add it into the regex via some loop and a StringBuilder, but it all seems so overdone. Obviously backreferencing didn't work either, since the 12 values are different.

My first approach was to write ([0-9]{1,2}\\.[0-9]{3}\\s){12}, but this will put all 12 floats in one long string, and that's not what I want, as I'd need another Pattern to pick off the floats one by one then, and then the duplicate-frenzy solution is preferred.

Thanks

+1  A: 

You could write the regexp to match a single float, and then use Matcher.find(int) to iterate through the occurrences.

vanza
This does work, but it means I will have to use a constant to increase the index for each new float I want to read, as the operation resets the matcher for every call, like so `for (int i = 1; i < 13; ++i) { if (matcher.find(C + 5*i)) { String value = matcher.group(); } }` and I am reluctant to use constants that way.Or am I mistaken in how to use the operation?
Mats_SX
You can use the find() method with no arguments, or use the value returned by Matcher.end() as the argument. You probably want a while loop ("while (m.find())").
vanza
+3  A: 

How about this:

Pattern lastYearIncomePattern = java.util.regex.Pattern.compile("(.+\\{\\s)(([0-9]{1,2}\\.[0-9]{3}\\s){12})");

Matcher matcher = lastYearIncomePattern.matcher(input);
boolean found =matcher.find();
if(found){
  String[] values= matcher.group(2).split("\\s");
}

It works. Would be interested to see if it can be done in one op like you were hoping.

aar
+1, but you should *check* the value returned by `find()` before calling `group()`.
Alan Moore
Thanks for catching that - updated.
aar
Yes, this works quite well. I definitely consider it better than my own solution. Still it's not the perfect alternative I was looking for (a backreference which accepted different values).
Mats_SX
@Mats_SX: Your perfect alternative doesn't exist. In Perl you can assign each capture to its own variable in the process of matching, but that means using twelve capture groups, like you did. In .NET you can use one capture group repeated twelve times and it will hold onto all the intermediate captures, but you still have to go through a CaptureCollection to access them; this `split()` approach seems handier to me. You can always encapsulate the result in a dedicated class if you think it's worth the effort.
Alan Moore
Yes, I'll proceed with this approach. Thanks for the help, everyone!
Mats_SX