tags:

views:

901

answers:

5
+1  Q: 

java regex line

In java i would like to read a file line by line and print the line to the output. I want to solve this with regular expressions.

while (...)
{
  private static java.util.regex.Pattern line = java.util.regex.Pattern.compile(".*\\n");
  System.out.print(scanner.next(line));
}

The regex in the code is not correct, as i get InputMismatchException. I am working on this regex for 2 hours. Please help with it.

With regex powertoy i see that ".*\n" is correct. But my program runs incorrectly.

The whole source is:

/**
 * Extracts the points in the standard input in off file format to the standard output in ascii points format.
 */

 import java.util.regex.Pattern;
 import java.util.Scanner;

class off_to_ascii_points 
{
    private static Scanner scanner = new Scanner(System.in); 
    private static Pattern fat_word_pattern = Pattern.compile("\\s*\\S*\\s*");
    private static Pattern line = Pattern.compile(".*\\n", Pattern.MULTILINE);

    public static void main(String[] args) 
    {
     try
     {
      scanner.useLocale(java.util.Locale.US);

                    /* skip to the number of points */
      scanner.skip(fat_word_pattern);

      int n_points = scanner.nextInt();

                    /* skip the rest of the 2. line */
      scanner.skip(fat_word_pattern); scanner.skip(fat_word_pattern);

      for (int i = 0; i < n_points; ++i)
      {
              System.out.print(scanner.next(line));
                      /*
                      Here my mistake is. 
                      next() reads only until the delimiter, 
                      which is by default any white-space-sequence. 
                      That is next() does not read till the end of the line 
                      what i wanted.

                      Changing "next(line)" to "nextLine()" solves the problem.
                      Also, setting the delimiter to line_separator 
                      right before the loop solves the problem too.
                      */
      }

     }
     catch(java.lang.Exception e)
     {
      System.err.println("exception");
      e.printStackTrace();
     }
    }
}

The beginning of an example input is:

OFF
4999996 10000000 0
-28.6663 -11.3788 -58.8252 
-28.5917 -11.329 -58.8287 
-28.5103 -11.4786 -58.8651 
-28.8888 -11.7784 -58.9071 
-29.6105 -11.2297 -58.6101 
-29.1189 -11.429 -58.7828 
-29.4967 -11.7289 -58.787 
-29.1581 -11.8285 -58.8766 
-30.0735 -11.6798 -58.5941 
-29.9395 -11.2302 -58.4986 
-29.7318 -11.5794 -58.6753 
-29.0862 -11.1293 -58.7048 
-30.2359 -11.6801 -58.5331 
-30.2021 -11.3805 -58.4527 
-30.3594 -11.3808 -58.3798

I first skip to the number 4999996 which is the number of lines containing point coordinates. These lines are that i am trying to write to the output.

A: 

You have to switch the Pattern into multiline mode.

line = Pattern.compile("^.*$", Pattern.MULTILINE);
System.out.println(scanner.next(line));
Bombe
MULTILINE is not working either.The $ character is not enough for me, as I want the new_line character to be included into the matched string.
Zoli
+3  A: 

I suggest using

private static Pattern line = Pattern.compile(".*");

scanner.useDelimiter("[\\r\\n]+"); // Insert right before the for-loop

System.out.println(scanner.next(line)); //Replace print with println


Why your code doesn't work as expected:

This has to do with the Scanner class you use and how that class works.

The javadoc states:

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace.

That means when you call one of the Scanner's.next* methods the scanner reads the specified input until the next delimiter is encountered.

So your first call to scanner.next(line) starts reading the following line

-28.6663 -11.3788 -58.8252

And stops at the space after -28.6663. Then it checks if the token (-28.6663) matches your provided pattern (.*\n) which obviously doesn't match (-28.6663). That's why.

jitter
Add a section to explain why your code fails
jitter
Perfect answer, thank.
Zoli
A: 

By default the scanner uses the white space as its delimiter. You must change the delimiter to the new line before you read the line after the first skips. The code you need to change is to insert the following line before the for loop:

scanner.useDelimiter(Pattern.compile(System.getProperty("line.separator")));

and update the Pattern variable line as following:

private static Pattern line = Pattern.compile(".*", Pattern.MULTILINE);

ccyu
The "line.separator" property is not to be relied on. Any given file may use any style of line separator, or even a mix of two more styles. Scanner's hasNextLine() and nextLine() methods take that into account.
Alan Moore
+1  A: 

If you only want to print the file to standard out, why do you want to use regexps? If you know that you always want to skip the first two lines, there are simpler ways to accomplish it.

import java.util.Scanner;
import java.io.File;

public class TestClass {
    public static void main(String[] args) throws Exception {
        Scanner in=new Scanner(new File("test.txt"));
        in.useDelimiter("\n"); // Or whatever line delimiter is appropriate
        in.next(); in.next(); // Skip first two lines
        while(in.hasNext())
            System.out.println(in.next());
    }
}
carlpett
I have to read in the number of lines. Which is the first word in the 2. line.
Zoli
A: 

Thank everybody for the help.

Now i understand my mistake:

The API documentation states, that every nextT() method of the Scanner class first skips the delimiter pattern, then it tries to read a T value. However it forgets to say that each next...() method reads only till the first occurrence of the delimiter!

Zoli