It seems to me you have probably leveraged FreeImage as far as it can go.
The FreeImage source (http://freeimage.sourceforge.net/download.html, BitmapAccess.cpp::FreeImage_AllocateT) seems to malloc image storage as a one-dimensional array:
unsigned dib_size = FreeImage_GetImageSize(width, height, bpp);
bitmap->data = (BYTE *)FreeImage_Aligned_Malloc(dib_size * sizeof(BYTE), FIBITMAP_ALIGNMENT);
Here, dib_size is "device independent bitmap size", and bpp is "bits per pixel".
I assume you use PixelAccess.cpp::FreeImage_GetScanLine() to get the line to copy:
BYTE * DLL_CALLCONV
FreeImage_GetScanLine(FIBITMAP *dib, int scanline) {
return (dib) ? CalculateScanLine(FreeImage_GetBits(dib), FreeImage_GetPitch(dib), scanline) : NULL;
}
which calls
inline unsigned char *
CalculateScanLine(unsigned char *bits, unsigned pitch, int scanline) {
return (bits + (pitch * scanline));
}
which seems to be O(1) for array-based lookup.
Since FreeImage internally uses static one-dimensional array storage, it does not seem readily apparent how you will achieve better than O(n) performance when growing the image (for example, when inserting copies of rows). It seems to me the best FreeImage can do is internally malloc new storage sufficient to grow the image, then copy the image data from the source to the new image. This would seem to be O(n).
Using a new data structure (such as B-tree) would take some effort but would give you better insertion-time characteristics (O(log(n)). However, the trade-off for speedup is increased storage space--you would be storing each pixel in a larger amount of space.
Since GDI+ seems closed-source, I am not sure how it implements bitmaps, but given the performance characteristics, it seems worse than FreeImage.