Ultimate and fastest way would be to have an array of textures (normal ones or cubemaps). Then dynamically fetch the texture slice according to an id stored in each cube instance data/ or cube face data (if you want a different texture on a per cube face basis) using GLSL built-in gl_InstanceID
or gl_PrimitiveID
.
With this implementation you would bind your texture array just once.
This would of course required used of gpu_shader4 and texture_array extensions:
I have used this mechanism (using D3D10, but principle applies too) and it worked very well.
I had to map on sprites (3D points of a constant screen size of 9x9 or 15x15 pixels IIRC) differents textures indicating each a different meaning for the user.
Edit:
If you don't feel comfy with all shader stuff, I would simply sort cubes by textures, and don't Z order the geometry. Then measure performances gains.
Also I would try to add a pre-Z pass where you render all your cubes in Z buffer only, then render normal scene, and see if it speed up things (if fragments bound, it could help).