ansaurus

Question

How can i filtered with the best performance? (JAVA)

Answer 1

+3 A:

Simple text filtering is probably easier to write in Perl (my choice because I've been using it for years) or Python (what I recommend to new people because it's a more modern language).

Paul Tomblin 2010-01-18 03:33:44

+1 Wow, a Perl hacker recommending Python ...Seriously though, Python can let you hit the ground running.

Hamish Grubijan 2010-01-18 03:37:15

Thanks but unfortunately I'm very weak in Perl and I don't know Python :P

Mike Redford 2010-01-18 06:16:52

Answer 2

A:

Several solutions to a similar problem using Java Scanner or StreamTokenizer were recently discussed here.

trashgod 2010-01-18 03:54:51

Thanks for your reply; with your source code ;it's returned all data , but i wanna returned just specific information such as top of the page ...

Mike Redford 2010-01-18 06:15:15

Yes, you'll have to filter inside the parsing loop. I've updated that example to compare Scanner and StreamTokenizer. The latter appears to be faster in that context.

trashgod 2010-01-18 12:05:30

Answer 3

+3 A:

Since your code has performance issues, you first need to find bottle neck. You can profile it with profiler available with IDE you use.

However since your code is not high in computation but IO intensive, both in reading file and output using System.out.print, that is where I would suggest you to improve on for improving on file IO.

.

Replace this line of code

Scanner scanner = new Scanner(new File("i:\\1\\2.txt"));

.

With this lines of code

File file = new File("i:\\1\\2.txt");
BufferedReader br = new BufferedReader( new FileReader(file)  );
Scanner scanner = new Scanner(br);

Let us know if this helps.

.

Since previous solution did not helped much, I made few more changes to improve your code. You may have to correct errors in parsing if any. I was able to display output of parsing 392832 lines in approx 5 seconds. Original solution takes more than 50 seconds.

Chages are as below:

Use of StringTokenizer instead of Scanner
Use of BufferedReader for reading file
Use of StringBuilder to buffer output

.

public class FileParse {

    private static final int FLUSH_LIMIT = 1024 * 1024;
    private static StringBuilder outputBuffer = new StringBuilder(
            FLUSH_LIMIT + 1024);
    private static final long countCellId;

    public static void main(String[] args) throws IOException {
        long start = System.currentTimeMillis();
        String fileName = "i:\\1\\2.txt";
        File file = new File(fileName);
        BufferedReader br = new BufferedReader(new FileReader(file));
        String line;
        while ((line = br.readLine()) != null) {
            StringTokenizer st = new StringTokenizer(line, ";|, ");
            while (st.hasMoreTokens()) {
                String token = st.nextToken();
                processToken(token);
            }
        }
        flushOutputBuffer();
        System.out.println("----------------------------");
        System.out.println("CELLID Count: " + countCellId);
        long end = System.currentTimeMillis();
        System.out.println("Time: " + (end - start));
    }

    private static void processToken(String token) {
        if (token.startsWith("CELLID=")) {
            String value = getTokenValue(token);
            outputBuffer.append("CELLID:").append(value).append("\n");
            countCellId++;
        } else if (token.startsWith("ENSUP=")) {
            String value = getTokenValue(token);
            outputBuffer.append("ENSUP:").append(value).append("\n");
        } else if (token.startsWith("ENCHO=")) {
            String value = getTokenValue(token);
            outputBuffer.append("ENCHO:").append(value).append("\n");
        }
        if (outputBuffer.length() > FLUSH_LIMIT) {
            flushOutputBuffer();
        }
    }

    private static String getTokenValue(String token) {
        int start = token.indexOf('=') + 1;
        int end = token.length();
        String value = token.substring(start, end);
        return value;
    }

    private static void flushOutputBuffer() {
        System.out.print(outputBuffer);
        outputBuffer = new StringBuilder(FLUSH_LIMIT + 1024);
    }

}

.

Update on ENSUP and MSLH:

To me it looks like you have switched ENSUP and MSLH in if statement as below. Hence you see "MSLH" value for "ENSUP" and vice a versa.

} else if (token.startsWith("MSLH=")) {
    String value = getTokenValue(token);
    outputBuffer.append("ENSUP:").append(value).append("\n");
} else if (token.startsWith("ENSUP=")) {
    String value = getTokenValue(token);
    outputBuffer.append("MSLH:").append(value).append("\n");
}

Gladwin Burboz 2010-01-18 04:49:05

Dear Gladwin Burboz Thanks for your replybut it's still very slow.

Mike Redford 2010-01-18 06:13:56

Mike, let me know if above solution is faster.

Gladwin Burboz 2010-01-19 00:22:25

Issue with this one is that if outputBuffer gets too big, it can cause OutOfMemoryError. Solution would be to flush output from time to time. Performance could further be enhanced by flushing output in seperate thread. I will update above solution further if I get some time.

Gladwin Burboz 2010-01-19 00:27:52

Updated code to flush outputBuffer each time it exceeds FLUSH_LIMIT.

Gladwin Burboz 2010-01-19 02:47:16

Dear Gladwin good job; and thank you very much indeed. it so faster and it has the best performance...

Mike Redford 2010-01-19 07:43:35

and could you please tell me about count ; how can i get count ....Thanks

Mike Redford 2010-01-19 07:45:27

I am glad that it worked fine... I am not sure what kind of count you want but I updated the code to print total count of each occurence of CELLID. Please see variable "countCellId" added to code. Hope that is what you need.

Gladwin Burboz 2010-01-19 16:37:42

Thanks a lot ; Have a nice life :)

Mike Redford 2010-01-19 20:06:16

Thanks to all those who voted for the answer. I am new to this forum and more votes/reputation helps.

Gladwin Burboz 2010-01-20 15:46:00

ansaurus

tags:

views:

answers:

How can i filtered with the best performance? (JAVA)

related questions