Hello; I have a large binary file to parse, and i am not sure about which language to use in order to improve the performance. Initially, i was going to use C# WPF as GUI, and a c DLL to do the parsing. but my target PC is 64 bit machine. and i had trouble to set up a c DLL project in VS 2008. so i am thinking if i should move to c++ or c# to do the parsing. I am just not sure the file reading speed of c++/C#, since my file is pretty big. the speed is very crucial. could anyone give me some suggestions? thanks.
Pick whatever language your writing the rest of the program in. Fire up a file stream and read the sucker.
Regardless of the code you use, it's still gonna be waiting on the disk to get around to wherever the data is.
Rather than focus on language (which, as others have mentioned, will have little effect), focus on the approach.
Generally, I recommend using file mapping (available in .NET 4.0 in the new MemoryMappedFile
class). This is good unless you are doing a single-pass, forward-only scan, which can be done using a regular stream.
There are a few hints that unmanaged code can pass to the file open routines that aren't exposed in .NET (specifically, informing the cache manager that you're going to access the file randomly or sequentially). The lack of these will probably not give you a noticeable performance impact, though.
Since you are windows life is a bit easier than some other platforms due to the excellent Overlapped IO API. This is what you want to use if you are truly trying squeeze performance out. Overlapped IO allows the IO's to occur out of order. You will notice that FileStream actually uses overlapped IO under the hood. If you can work within its limitations then just use that. Otherwise create a managed c++ wrapper to do the reading for you using ReadFile.
The reason that this is the correct approach is that the disk IO should be the slowest part of the program. Using overlapped IO if there is nothing else accessing the disk you should be able to get close to the disks practical limit for throughput. The decoding into a data structure should be trivial. If it is not you should reexamine how you are parsing the data.