I need to rewrite about 4KB of data in reverse order, at bit level (last bit of last byte becoming first bit of first byte), as fast as possible. Are there any clever sniplets to do it?
Rationale: The data is display contents of LCD screen in an embedded device that is usually positioned in a way that the screen is on your shoulders level. The screen has "6 o'clock" orientation, that is to be viewed from below - like lying flat or hanging above your eyes level. This is fixable by rotating the screen 180 degrees, but then I need to reverse the screen data (generated by library), which is 1 bit = 1 pixel, starting with upper left of the screen. The CPU isn't very powerful, and the device has enough work already, plus several frames a second would be desirable so performance is an issue; RAM not so much.
edit: Single core, ARM 9 series. 64MB, (to be scaled down to 32MB later), Linux. The data is pushed from system memory to the LCD driver over 8-bit IO port.
The CPU is 32bit and performs much better at this word size than at byte level.