views:

231

answers:

4

I was going to write a long-winded post, but I'll boil it down here:

I'm trying to emulate the graphical old-school style of the NES via XNA. However, my FPS is SLOW, trying to modify 65K pixels per frame. If I just loop through all 65K pixels and set them to some arbitrary color, I get 64FPS. The code I made to look-up what colors should be placed where, I get 1FPS.

I think it is because of my object-orented code.

Right now, I have things divided into about six classes, with getters/setters. I'm guessing that I'm at least calling 360K getters per frame, which I think is a lot of overhead. Each class contains either/and-or 1D or 2D arrays containing custom enumerations, int, Color, or Vector2D, bytes.

What if I combined all of the classes into just one, and accessed the contents of each array directly? The code would look a mess, and ditch the concepts of object-oriented coding, but the speed might be much faster.

I'm also not concerned about access violations, as any attempts to get/set the data in the arrays will done in blocks. E.g., all writing to arrays will take place before any data is accessed from them.


As for casting, I stated that I'm using custom enumerations, int, Color, and Vector2D, bytes. Which data types are fastest to use and access in the .net Framework, XNA, XBox, C#? I think that constant casting might be a cause of slowdown here.

Also, instead of using math to figure out which indexes data should be placed in, I've used precomputed lookup tables so I don't have to use constant multiplication, addition, subtraction, division per frame. :)

+1  A: 

Are you specifying a color and such for each pixel or something? If that is the case I think you should really think about the architecture some more. Start using sprites that will speed things up.

EDIT

Okay I think what your solution could be load several sprites with different colours (a sprite of a few pixels) and reuse those. It is faster to point to the same sprite than to assign a different colour to each pixel as the sprite has already been loaded into memory

Chino
I'm modifying each pixel of a texture and then drawing the texture to the screen. Compared to drawing 65K sprites, this is much faster. I have asked a question asking this too :P
Jeffrey Kern
Yikes! That's going to be slow. Drawing a large number of sprites can be very fast indeed if you first load them into video memory. The card can then be instructed to do the heavy lifting with a single instruction over your bus. Doing things one pixel at a time will result in enormous bus traffic, no matter how you do it. That's the killer.
Peter Ruderman
Your are modifying every pixel of a texture?.. I think you are doing something fundamentaly wrong a sprite is there for you to use as a standard component maybe a few filters can be put over it but if you are modifying each pixel then you are killing the whole purpose of a sprite...
Chino
How would you go about doing this, Peter? The only experience I have with drawing sprites is using SpriteBatch in XNA.
Jeffrey Kern
@Chino the sprites can only be colored a certain way. In addition, I would like the ability to change the colors of a sprite on-the-fly as well. E.g., anything that is currently Black in the sprite could get drawn as green w/o affecting the rest of the sprite. It is known as palette swapping, which XNA does not offer (which I am aware of).
Jeffrey Kern
I'm afraid I'm not familiar with XNA, so one one else will have to pick up on that one. It will certainly have a mechanism, however. Graphics acceleration works by loading graphic data into video memory and then issuing small numbers of instructions directly to the card to perform the manipulations. If you're going pixel by pixel, then either your transferring your whole sprite over the bus or sending one instruction per pixel over the bus. Either way, you've defeated the purpose of having a graphics accelerator!
Peter Ruderman
@jeffrey I'm no xna expert, but the little I've played with it combined with general perf advice, I would suggest loading your sprites to the card, and writing shaders to manipulate them. If you are getting / setting on a bunch of C# objects, that suggests that you are hitting your CPU instead of the GPU where you should be. Your GPU can process a very large amount of data every second, but it needs to be in the GPU's memory first.
BioBuckyBall
@Jeffery Kern: You could probably implement pallet swapping on sprites with a pixel shader. Here's a good starting point: http://creators.xna.com/en-us/sample/spriteeffects
Andrew Russell
+4  A: 

Have you profiled your code to determine where the slowdown is? Before you go rewriting your application, you ought to at least know which parts need to be rewritten.

I strongly suspect that the overhead of the accessors and data conversions is trivial. It's much more likely that your algorithms are doing unnecessary work, recomputing values that they could cache, and other things that can be addressed without blowing up your object design.

JSBangs
Thank you for the comment. However, I only wrote this app in one day as a 'proof-of-concept', so rewriting it isn't a big deal. And I'm already using lookup tables to reduce overhead. And I'm calling 300,000 getters per frame. Even if it is 'trivial', 300,000 calls is a lot.
Jeffrey Kern
A: 

As with any performance problem, you should profile the application to identify the bottlenecks rather than trying to guess. I seriously doubt that getters and setters are at the root of your problem. The compiler almost always inlines these sorts of functions. I'm also curious what you have against math. Multiplying two integers, for instance, is one of the fastest things the computer can do.

Peter Ruderman
It isn't anything against math. However, it makes more sense to create a pre-determined lookup table to store math values instead of constantly adding them frame by frame. For instance, lets say I have (foo = (x*256) + y). Wouldn't it be faster to look up the predetermined value of 'foo' rather than figuring it out each second? Espcially if I know the limit to the values of x and y? And I know that division is a slow process as well.
Jeffrey Kern
Look up, as in array access? You will be causing more than one instruction per access to be performed. I would only precompute things that take much longer, like trig functions etc. In your case, foo = (x*256) + y) is probably faster (at least by counting cpu instructions) than foo = table[i], but it's hard to say for sure without profiling.
BioBuckyBall
Table lookup used to be a big performance trick. But modern processors can do math much faster while memory access hasn't kept up. So be careful with table lookup. Only if the function is expensive to compute. And your simple equation would not be expensive to compute, especially using floating point.
bbudge
+3  A: 

There's a terrific presentation from GDC 2008 that is worth reading if you are an XNA developer. It's called Understanding XNA Framework Performance.

For your current architecture - you haven't really described it well enough to give a definite answer - you probably are doing too much unnecessary "stuff" in a tight loop. If I had to guess, I'd suggest that your current method is thrashing the cache - you need to fix your data layout.

In the ideal case you should have a nice big array of small-as-possible value types (structs not classes), and a heavily inlined loop that shoves data into it linearly.

(Aside: regarding what is fast: Integer and floating point maths is very fast - in general, you shouldn't use lookup tables. Function calls are pretty fast - to the point that copying large structs when you pass them will be more significant. The JIT will inline simple getters and setters - although you shouldn't depend on it to inline anything else in very tight loops - like your blitter.)

HOWEVER - even if optimised - your current architecture sucks. What you are doing flies in the face of how a modern GPU works. You should be loading your sprites onto your GPU and letting it composite your scene.

If you want to manipulate your sprites at a pixel level (for example: pallet swapping as you have mentioned) then you should be using pixel shaders. The CPU on the 360 (and on PCs) is fast, but the GPU is so much faster when you're doing something like this!

The Sprite Effects XNA sample is a good place to get started.

Andrew Russell