tags:

views:

119

answers:

4

In Java, I'm attempting to parse data from an ASCII output file. A sample of the data looks is show below. The values are formatted precision 5 scale 3 and no space exists between the values.

80.234 <- 1 value
71.01663.129 <- 2 values ...
67.09159.25353.997
56.02759.77859.25057.749
55.86558.46958.64861.72855.969

What regular expression pattern can I use to match the number values and split them into groups? The pattern (\d+.\d{1,3}) matches a single value. However, with the number of groups for the line specified it does not give the expected answer. For example, I expected the following to find 10 groups.

String testPattern = "68.65761.25659.01057.67657.14857.06457.41658.77861.16268.641";

// create a pattern to match the output
Pattern p = Pattern.compile("(\\d+\\.\\d{1,3}){10}");

Matcher m = p.matcher(testPattern);

if (m.find())
{
    String group = m.group();
}
+4  A: 

If they're all identically formatted, perhaps it would be easier to just read in 6 characters as a string, then use Double.parseDouble to parse that from string to Double?

Brian Knoblauch
+2  A: 

There is only 1 group with your regex. Use a while loop to enumerate all of them. (See http://www.ideone.com/FNRsz):

String testPattern = "68.65761.25659.01057.67657.14857.06457.41658.77861.16268.641";
Pattern p = Pattern.compile("\\d+\\.\\d{1,3}");
Matcher m = p.matcher(testPattern);

while(m.find())   // <---
   System.out.println(m.group());
KennyTM
This will work, even though his regexp is wrong. I'm guessing that it should be "\\d\\d\\.\\d\\d\\d". The variable length matching groups on both ends only work because the numbers are always at maximum length, so greediness does the right thing.
Darron
+1  A: 

Using Guava, a fixed-length Splitter would work well here.

Iterable<String> numbers = Splitter.fixedLength(6).split(testPattern);

If you were to create a Function<String, Double> (called, say, Numbers.doubleParser()), you could even convert the data to numbers easily. (Obviously you could use BigDecimal or whatever rather than Double depending on your needs.)

private static final Splitter SPLITTER = Splitter.fixedLength(6);

...

public void someMethod(String stringToParse) {
  for(Double value : Iterables.transform(SPLITTER.split(stringToParse),
                                         Numbers.doubleParser())) {
    ...
  }
}
ColinD
+2  A: 

You're expecting it to somehow break out the individual numbers because that's how you matched them, but it doesn't work that way. What your regex does is capture one number at a time and place it into group #1. Ten times it does this, each time overwriting the contents of group #1 with the new value. When it's done, group() returns the whole string as you discovered, while group(1) would return only the tenth number, 68.641.

This is a common error, probably due to Java's lack of a built-in "find all matches" mechanism. .NET has its Matches() methods, PHP has preg_match_all(), Python has re.findall(), Perl and JavaScript have the /g modifier... every major flavor has a mechanism to return either an array of all matches or an iterator over the matches, or both. But in Java you're expected to call find() in a while loop, as @KennyTM demonstrated.

It's an annoying omission, but not really a surprising one, for Java. Its effect is to force us to write more verbose, less idiomatic code, which has been a Java hallmark from the very beginning. But if you really want to reduce this task to a one-liner, there's the old "split on a lookaround" trick:

String[] result = source.split("(?=\\B\\d{2}\\.\\d{3})");

...or:

String[] result = source.split("(?<=\\G\\d{2}\\.\\d{3})");
Alan Moore
+1 Awesome split() trick!
Helper Method