views:

512

answers:

3

I have a puzzling situation and I would need an expert opinion as to the cause of the phenomenon explained below. A couple of weeks ago, I have conducted a session titled "An overview .NET for Java developers" and as a part of it I wrote a quick class C# (3.5 framework) to read from a file and write to another file line by line (in an iteration). As my audience were java developers, I had the same code in a java class for side by side comparison. However, when I ran these classes on the same machine, to my surprise the java code consistently ran twice as fast than C# code. I have tried many optimizations in C# code to narrow the gap but could not succeed. There has to be an explanation and I am looking for somebody that can explain the cause. I am attaching the source code from both the classes for your reference.


Java class

    public class ReadWriteTextFile {

    static public String getContents(File aFile, String OutPutFileName) {
    StringBuilder contents = new StringBuilder();

    try {
      BufferedReader input =  new BufferedReader(new FileReader(aFile));
      FileReader x = new FileReader(aFile);
      try {
        String line = null;
        while (( line = input.readLine()) != null){
              setContents(OutPutFileName, line + System.getProperty("line.separator"));
        }
      }
      finally {
        input.close();
      }
    }
    catch (IOException ex){
      ex.printStackTrace();
    }

    return contents.toString();
    }

  static public void setContents(String FileName, String aContents)
                                 throws FileNotFoundException, IOException { 
    try {
     FileWriter fstream = new FileWriter(FileName, true);
     BufferedWriter out = new BufferedWriter(fstream);
     out.write(aContents);
             out.close();
    } catch (Exception xe) {
     xe.printStackTrace();
    }
  }
  public static void main (String[] aArguments) throws IOException {

    System.out.println(getDateTime() + ": Started");
    File testFile = new File("C:\\temp\\blah.txt");
         String testFile2 = "C:\\temp\\blahblah.txt";

    for(int i=0; i<100; i++){
         getContents(testFile, testFile2);
     }

    System.out.println(getDateTime() + ": Ended");

  }

  private synchronized static String getDateTime() {
     DateFormat dateFormat = new SimpleDateFormat(
             "yyyy/MM/dd HH:mm:ss");
     Date date = new Date();
     return dateFormat.format(date);
    }
}


C# class

class ReadWriteTextFile
{
    static void Main(string[] args)
    {
        System.Diagnostics.Trace.WriteLine(getDateTime() + ": Started");
        String testFile = "C:\\temp\\blah.txt";
        String testFile2 = "C:\\temp\\blahblah.txt";
        for(int i=0; i<100; i++){
            getContents(testFile, testFile2);
        }
        System.Diagnostics.Trace.WriteLine(getDateTime() + ": Ended");
    }

    static public void getContents(String sourceFile, String targetFile) {      
        try {
            using (StreamReader r = File.OpenText(sourceFile))
            {
                String line;
                while ((line = r.ReadLine()) != null)
                {
                    setContents(targetFile, line);
                }
                r.Close();
            }
    }
    catch (IOException ex){
        Console.WriteLine(ex.StackTrace);
    }
  }

  static public void setContents(String targetFile, String aContents)
  {

    try {
        //FileStream fsO = new FileStream(targetFile, FileMode.Append);
        //StreamWriter w = new StreamWriter(fsO);
        FileStream fs = new FileStream(targetFile, FileMode.Append,
                                FileAccess.Write, FileShare.None);
        using (StreamWriter w = new StreamWriter(fs))
        {
            w.WriteLine(aContents + "\n");
        }
    } catch (Exception xe) {
        Console.WriteLine(xe.StackTrace);
    }
  }

  private static String getDateTime() {
      DateTime dt = DateTime.Now;
      return dt.ToString("yyyy/MM/dd HH:mm:ss");
   }
}


+8  A: 

For one thing: in Java you're using the platform's default encoding. That may well be a fixed "single byte per character" encoding, which is clearly going to be simpler than using UTF-8, which .NET does by default.

In addition, you're writing two newlines in .NET, and only one in Java.

One thing to check is whether you're CPU-bound or IO-bound. I'd expect this to be IO-bound, but I've certainly been surprised before now.

Finally, you should run each test after a reboot to try to remove disk caches from the equation as far as possible.

Jon Skeet
How do I turn off the UTF-* encoding in .NET and why do you say I am writing two lines in .NET version???? How do I check if I am CPU bound or IO bound??
JS Facts: Jon Skeet left MS and suddenly all C# program ran twice as slow as before :)
kd304
You're calling WriteLine *and* adding "\n". Specify the encoding by using the constructor overload for StreamWriter which takes one. Use Encoding.Default to make it work like the Java version. Create a new StreamReader instead of calling File.OpenText - that way you can specify the encoding there too.
Jon Skeet
@kd304: I know the gist of your remark was a joke, but in case you were serious at all: I've never worked for MS.
Jon Skeet
Just joking. Sorry for that.
kd304
No problem - I just don't want anyone to be under the impression that I work for MS when I don't.
Jon Skeet
You can verify which encoding Java is using with - System.out.println(System.getProperty("file.encoding"));
ScArcher2
I have changed the following I have removed the \n and I have used the streamReader with Encoding.Default and still the same.
System.out.println(System.getProperty("file.encoding")); resulted in Cp1252
Right. And have you discovered whether it's CPU-bound or not? How much of the processor is it taking with Java, and how much with .NET?
Jon Skeet
with both processes the CPU consumptions was about the same around 48%. How could I force .net to user cp1252 encoding.
Is this a dual core machine? If so, that suggests it's somewhat CPU bound. Using Encoding.Default is likely to pick the cp1252 as well - or you could use Encoding.GetEncoding(1252). Btw, you don't need to call r.Close() when r is already in a using statement.
Jon Skeet
@Jon,@Sai: wouldn't it be good if we first read all of source file at once in some buffer and then write it to another file form that buffer. That way in my opinion we would save time invested in file "open/close/append" function. Coz in current implementation we do it ..for each of line of the source file. too much of IO thing. Thanks
SilverHorse
@SilverHorse: Yes, that would certainly be a lot faster... but I suspect the point isn't to make it as fast as possible at any cost, but just to find out why basically the same code is faster in Java than .NET. It sounds like it could be that Java is faster at opening files for append, for example.
Jon Skeet
Exactly....I understand that in real world I will read it in to a buffer rather than keep reading line by line or not open the target file so many times.....but like Jon Skeet observed, I am breaking my head trying to understand why .NET whose native platform is windows has inferior performance in IO operations.....
+3  A: 

I don't see any code issues here. Some possibilities: you ran the C# code in debug mode perhaps? There was an issue with the file caching. The C# data file operated on a heavily fragmented disk area. I wouldn't expect half speed for such a simple C# program as this.

Edit: I tried both version on a 10439 byte blah.txt. The generated file was 1 043 900 bytes long.

C# (CTRL+F5) time was 18 seconds
C# (F5) time was 22 seconds
Java time was 17 seconds.

Both applications ate about 40% CPU time, half of it was kernel time.

Edit2: The CPU bound is due that the code is constantly opening, closing and writing small chunks of data. This causes a lot of managed-native and user-kernel mode transitions.

My system spec: Core 2 Duo 2.4GHz, 2 GB 800MHz RAM, WinXP SP3

kd304
I ran the C# code in both debug and release mode with similar timings.
By the comment above, do you mean that you ran it have *built* it with debug and release, or that you ran it in the debugger and not in the debugger? The latter makes *much* more difference than the former, in my experience. From Visual Studio, hit Ctrl-F5 instead of F5.
Jon Skeet
Let me be clear....I ran both by hitting F5(debug) and Ctrl+F5 (run without debugging)
+1 for testing - and basically Java and C# are neck-and-neck when not under a debugger...
Jon Skeet
+2  A: 

The slow part of the benchmarks looks as if it is where a single file is repeatedly opened, decorated, a small write and closed again. Not a useful benchmark. Obvious differences would be how big the buffers are (with a single write, you don't actually need any) and whether the resulting file is synced to disc.

Tom Hawtin - tackline
You know of any simple IO operation bench mark between java and c#, if I need to demonstrate to folks.
Pick something *realistic*. At the moment your bottleneck may well be opening the file to append to it, which isn't typically the bottleneck in real applications.
Jon Skeet