views:

119

answers:

4

Hello everyone. Let me preface this post with a single caution. I am a total beginner when it comes to Java. I have been programming PHP on and off for a while, but I was ready to make a desktop application, so I decided to go with Java for various reasons.

The application I am working on is in the beginning stages (less than 5 classes) and I need to read bytes from a local file. Typically, the files are currently less than 512kb (but may get larger in the future). Currently, I am using a FileInputStream to read the file into three byte arrays, which perfectly satisfies my requirements. However, I have seen a BufferedInputStream mentioned, and was wondering if the way I am currently doing this is best, or if I should use a BufferedInputStream as well.

I have done some research and have read a few questions here on Stack Overflow, but I am still having troubles understanding the best situation for when to use and not use the BufferedInputStream. In my situation, the first array I read bytes into is only a few bytes (less than 20). If the data I receive is good in these bytes, then I read the rest of the file into two more byte arrays of varying size.

I have also heard many people mention profiling to see which is more efficient in each specific case, however, I have no profiling experience and I'm not really sure where to start. I would love some suggestions on this as well.

I'm sorry for such a long post, but I really want to learn and understand the best way to do these things. I always have a bad habit of second guessing my decisions, so I would love some feedback. Thanks!

+1  A: 

BufferedInputStream reads more of the file that you need in advance. As I understand it, it's doing more work in advance, like, 1 big continous disk read vs doing many in a tight loop.

As far as profiling - I like the profiler that's built into netbeans. It's really easy to get started with. :-)

jskaggz
Thanks for the suggestions. I heard someone mention the profilier in NetBeans. I started using NetBeans, however, I have switched to using just a plain text editor for the time being. I feel that I learn more about the language that way. Do you have any other suggestions?
Jason Watkins
Text editors are great, but it's kind of like pedaling a dump truck if you're billing clients. You might try hprof if you want to avoid doing the profiling in an ide: http://java.sun.com/developer/technicalArticles/Programming/HPROF.html
jskaggz
Thanks @jskaggz. I will check out hprof. BTW, I am making this application for myself, so I am not really on a timetable, but I agree that if it were for a client, I would definitely use an ide to speed it along.
Jason Watkins
+1  A: 

I can't speak to the profiling, but from my experience developing Java applications I find that using any of the buffer classes - BufferedInputStream, StringBuffer - my applications are exceptionally faster. Because of which, I use them even for the smallest files or string operation.

Jason McCreary
When you use the BufferedInputStream, do you usually specify a particular size chunk for it to buffer, or do you let it automatically decide?
Jason Watkins
This depends. As Stephen C said above, if this number doesn't coincide well with the data page size used in the syscalls (say 4k) then you just shot yourself in the foot by creating a bottleneck.Think of it like filling a sandbag with a shovel. If you scoop too much or too little sand onto the shovel, you've just decreased efficiency/performance.Just a side note that I am an advocate of writing good code. But if you are just starting out, there is nothing wrong with getting it to work and then optimizing later. These things can be rabbit holes.
Jason McCreary
@Jason McCreary Sound advice, thanks!
Jason Watkins
+2  A: 

If you are using a relatively large arrays to read the data a chunk at a time, then BufferedInputStream will just introduce a wasteful copy. (Remember, read does not necessarily read all of the array - you might want DataInputStream.readFully). Where BufferedInputStream wins is when making lots of small reads.

Tom Hawtin - tackline
I think I understand what you are saying. Let me ask you another question. I see a constructor for FileInputStream that takes a byte[] as a parameter. Currently, I am using a for loop to read the desired bytes, however, I assume using this parameter instead would be more efficient? I also assume that using a for loop to constantly call read from the FileInputStream is what you mean by lots of small reads? I sorry to sound so noobish, but I am having a hard time completely grasping this for some reason. Thanks for your answer!
Jason Watkins
@mastermosaj You might be seeing the constructor for `ByteArrayInputStream`, which is an `InputStream` that reads through a `byte[]` so does no actual I/O. If you are reading through your `byte[]` byte by byte then you will probably find using a `BufferedInputStream` or `ByteArrayInputStream` simplifies your code at some performance cost. (Note don't mix using `BufferedInputStream` with using the underlying stream, because the former buffers.
Tom Hawtin - tackline
@Tom, thanks again, I will keep this in mind!
Jason Watkins
+4  A: 

If you are consistently doing small reads then a BufferedInputStream will give you significantly better performance. Each read request on an unbuffered stream typically results in a system call to the operating system to read the requested number of bytes. The overhead of doing a system call is may be thousands of machine instructions per syscall. A buffered stream reduces this by doing one large read for (say) up to 8k bytes into an internal buffer, and then handing out bytes from that buffer. This can drastically reduce the number of system calls.

However, if you are consistently doing large reads (e.g. 8k or more) then a BufferedInputStream slows things. You typically don't reduce the number of syscalls, and the buffering introduces an extra data copying step.

In your use-case (where you read a 20 byte chunk first then lots of large chunks) I'd say that using a BufferedInputStream is more likely to reduce performance than increase it. But ultimately, it depends on the actual read patterns.

Stephen C
Thanks for your insightful answer. I think I am starting to understand now.
Jason Watkins
Well said stephen c.
jskaggz
Two people liked this answer and no upvotes!? +1, for bringing the read size into the picture.
Moron
I would upvote this, but I don't have enough reputation to do so.
Jason Watkins