views:

111

answers:

3

I'm working with a file with about 2G. I want to read the file line by line to find some specific terms. Whitch class can I better use: FileReader or FileInputStream? And how can I find the specific words efficiently. I'm just using the split() method, but may be can I use the java.util.regex.Pattern class in combination with java.util.regex.Matcher class.

So the Questions are: which class can I use: the FileReader or the FileInputStream? can I use the split method or the regex classes

Does someone has an answer to this questions? Thans.

A: 

You'll want to use a Reader (probably wrapped in a BufferedReader), since you're working with String data, as opposed to binary. You should pre-compile your pattern (Pattern.compile). Beyond that, it's unclear from your description if you should use Pattern.split, or if using a Matcher would be more appropriate.

Note that str.split(regex, limit) is equivalent to Pattern.compile(regex).split(str, limit)

Matthew Flaschen
thanks for the answer
+3  A: 

The best option would be to use a BufferedReader (for its readLine() method) wrapping an InputStreamReader (for its ability to specify the encoding) wrapping a FileInputStream (for actually reading the file):

BufferedReader br = new BufferedReader(new InputStreamReader(
    new FileInputStream(name), encoding));

FileReader uses the platform default encoding, which is usually a bad idea, making the class mainly a trap for developers who are not aware of the potential for problems.

If you just want to find substrings in the lines, String.indexOf() is the most efficient way; using regexes is better if you're actually looking for specific patterns.

Michael Borgwardt
+1 best practice
leonbloy
thanks for the answer
A: 

The BufferedReader has a readLine() method that can be used for reading line by line. The Reader (and Writer) classes can be used for String data, where the InputStream (and OutputStream) should be used for binary data (byte arrays).

BufferedReader reader = new BufferedReader(new FileReader(file));
String line = null;
while((line = reader.readLine()) != null) {
    // Do something with the line
}
Marc
thanks for the answer