views:

1229

answers:

4

I am noticing a large performance difference between Java & JOGL and C# & Tao.OpenGL when both loading PNGs from storage into memory, and when loading that BufferedImage (java) or Bitmap (C# - both are PNGs on hard drive) 'into' OpenGL.

This difference is quite large, so I assumed I was doing something wrong, however after quite a lot of searching and trying different loading techniques I've been unable to reduce this difference.

With Java I get an image loaded in 248ms and loaded into OpenGL in 728ms The same on C# takes 54ms to load the image, and 34ms to load/create texture.

The image in question above is a PNG containing transparency, sized 7200x255, used for a 2D animated sprite. I realise the size is really quite ridiculous and am considering cutting up the sprite, however the large difference is still there (and confusing).

On the Java side the code looks like this:

BufferedImage image = ImageIO.read(new File(fileName));
texture = TextureIO.newTexture(image, false);
texture.setTexParameteri(GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR);
texture.setTexParameteri(GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR);

The C# code uses:

Bitmap t = new Bitmap(fileName);

t.RotateFlip(RotateFlipType.RotateNoneFlipY);
Rectangle r = new Rectangle(0, 0, t.Width, t.Height);

BitmapData bd = t.LockBits(r, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);

Gl.glBindTexture(Gl.GL_TEXTURE_2D, tID);
Gl.glTexImage2D(Gl.GL_TEXTURE_2D, 0, Gl.GL_RGBA, t.Width, t.Height, 0, Gl.GL_BGRA, Gl.GL_UNSIGNED_BYTE, bd.Scan0);
Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MIN_FILTER, Gl.GL_LINEAR);
Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MAG_FILTER, Gl.GL_LINEAR);

t.UnlockBits(bd);
t.Dispose();

After quite a lot of testing I can only come to the conclusion that Java/JOGL is just slower here - PNG reading might not be as quick, or that I'm still doing something wrong.

Thanks.

Edit2:

I have found that creating a new BufferedImage with format TYPE_INT_ARGB_PRE decreases OpenGL texture load time by almost half - this includes having to create the new BufferedImage, getting the Graphics2D from it and then rendering the previously loaded image to it.

Edit3: Benchmark results for 5 variations. I wrote a small benchmarking tool, the following results come from loading a set of 33 pngs, most are very wide, 5 times.

testStart: ImageIO.read(file) -> TextureIO.newTexture(image)  
result: avg = 10250ms, total = 51251  
testStart: ImageIO.read(bis) -> TextureIO.newTexture(image)  
result: avg = 10029ms, total = 50147  
testStart: ImageIO.read(file) -> TextureIO.newTexture(argbImage)  
result: avg = 5343ms, total = 26717  
testStart: ImageIO.read(bis) -> TextureIO.newTexture(argbImage)  
result: avg = 5534ms, total = 27673  
testStart: TextureIO.newTexture(file)  
result: avg = 10395ms, total = 51979

ImageIO.read(bis) refers to the technique described in James Branigan's answer below. argbImage refers to the technique described in my previous edit:

img = ImageIO.read(file);
argbImg = new BufferedImage(img.getWidth(), img.getHeight(), TYPE_INT_ARGB_PRE);
g = argbImg.createGraphics();
g.drawImage(img, 0, 0, null);
texture = TextureIO.newTexture(argbImg, false);

Any more methods of loading (either images from file, or images to OpenGL) would be appreciated, I will update these benchmarks.

+1  A: 

I'm not sure that it will completely close the performance gap, but you should be able to use the ImageIO.read method that takes a InputStream and pass in a BufferedInputStream wrapping a FileInputStream. This should greatly reduce the number of native file I/O calls that the JVM has to perform. It would look like this:

File file = new File(fileName);
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis, 8192); //8K reads
BufferedImage image = ImageIO.read(bis);
James Branigan
Thanks for your comment.In benchmarking tests I found that this approach performed worse than ImageIO.read(fileName). I have edited my post above with benchmark results and the texture load variations.
Edward Cresswell
Edward,Thanks for the benchmark info. Can you put an middle timer in as well, so that we can see the % spilt between the ImageIO call and the TextureIO call? Couple other questions...What JVM are you using?What are the memory parameters on the JVM?Are you running the JVM in JIT or in Interpretted mode?Have you configured the GC?, or are you running GC with defaults?What class of processor are you running this on?Any observations of CPU/IO bound on your system during the benchmark?Does your JVM support AOT of classes?
James Branigan
I'm running java 1.6.0_15 (b03) with Hotspot VM saying 14.1-b02 mixed mode, from Win7 64-bit (but running the 32-bit dist folder). I haven't touched GC settings, and not sure what the JVM is running in - default settings, "mixed mode" I guess implies something but not sure what! One core is 100% whilst benchmarking - here are more detailed benchmark results - http://pastebin.mozilla.org/692710
Edward Cresswell
Edward, Are you able to modify the source of the JOGL java code to add some timings? I think if you look at http://kenai.com/projects/jogl/sources/jogl-git/content/src/jogl/classes/com/sun/opengl/util/texture/awt/AWTTextureData.java?rev=d940028e6ead91da8ad516fdaf3a4671fd179cf8 , you'll find the culprit. There is a suspicious comment at the bottom and I can see some differing code paths based on the TYPE_INT_ARGB_PRE flag.
James Branigan
Also since that comment ran long, on the gc side, try a couple of the options from this page: http://java.sun.com/javase/technologies/hotspot/vmoptions.jspSpecifically, -XX:-PrintTenuringDistribution and -XX:-PrintGCDetailsThese will give an idea of if the GC is working too hard or not configured properly for this workload.
James Branigan
A: 

Have you looked into JAI (Java Advanced Imaging) by any chance, it implements native acceleration for tasks such as png compressing/decompression. The Java implementation of PNG decompression may be the issue here. Which version of jvm are you using ?

I work with applications which load and render thousands of textures, for this I use a pure Java implementation of DDS format - available with NASA WorldWind. DDS Textures load into GL faster since it is understood by the graphics card.

I appreciate your benchmarking and would like to use your experiments to test out DDS load times. Also tweak the memory available to JAI and JVM to allow loading of more segments and decompression.

whatnick
A: 

Actually, i load my textures in JOGL like this:

TextureData data = TextureIO.newTextureData(stream, false, fileFormat);
Texture2D tex = new Texture2D(...);   // contains glTexImage2D
tex.bind(g);
tex.uploadData(g, 0, data);  // contains glTexSubImage2D

Load textures in this way can bypass the extra work for contructing a BufferedImage and interpreting it. It's pretty fast for me. U can profile it out. im waiting for your result.

dex
+2  A: 

Short Answer The JOGL texture classes do quite a bit more than necessary, and I guess that's why they are slow. I run into the same problem a few days ago, and now fixed it by loading the texture with the low-level API (glGenTextures, glBindTexture, glTexParameterf, and glTexImage2D). The loading time decreased from about 1 second to "no noticeable delay", but I haven't done any systematic profiling.

Long Answer If you look into the documentation and source code of the JOGL TextureIO, TextureData and Texture classes, you notice that they do quite a bit more than just uploading the texture onto the GPU:

  • Handling of different image formats
  • Alpha premultiplication

I'm not sure which one of these is taking more time. But in many cases you know what kind of image data you have available, and don't need to do any premultiplication.

The alpha premultiplication feature is anyway completely misplaced in this class (from a software architecture perspective), and I didn't find any way to disable it. Even though the documentation claims that this is the "mathematically correct way" (I'm actually not convinced about that), there are plenty of cases in which you don't want to use alpha premultiplication, or have done it beforehand (e.g. for performance reasons).

After all, loading a texture with the low-level API is quite simple unless you need it to handle different image formats. Here is some scala code which works nicely for all my RGBA texture images:

val textureIDList = new Array[Int](1)
gl.glGenTextures(1, textureIDList, 0)
gl.glBindTexture(GL.GL_TEXTURE_2D, textureIDList(0))
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR)
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR)
val dataBuffer = image.getRaster.getDataBuffer   // image is a java.awt.image.BufferedImage (loaded from a PNG file)
val buffer: Buffer = dataBuffer match {
  case b: DataBufferByte => ByteBuffer.wrap(b.getData)
  case _ => null
}
gl.glTexImage2D(GL.GL_TEXTURE_2D, 0, GL.GL_RGBA, image.getWidth, image.getHeight, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, buffer)

...

gl.glDeleteTextures(1, textureIDList, 0)