If you are using logic ops, you don't need to render to a texture. Simply use stenciling - that is, draw the first shape into the stencil buffer (only), then use the stencil test to eliminate masked pixels.
(This tutorial is for desktop windows GL, but I think the stencil logic follows.)
http://nehe.gamedev.net/data/lessons/lesson.asp?lesson=26
If you want to use texturing, I would suggest rendering the first shape into the alpha channel of a texture (either by OES_framebuffer or by using glCopyTexImage2D.
Then use glTexEnv with 'combine' to multiply the alpha channel of the real texture or color by the mask texture. This will effectively "cut out" most of your pixels. (Use depth test if you don't want masked pixels to write to the Z buffer)
There is one ugly complexity of the texture-centric approach: you need to generate texture coordinates in screen space by using the combined model view and projection matrix as a texture matrix, and then feeding the vertices into glTexCoordPointer. This is sort of a "poor man's glTexGen" for 1.1 hardware.
Between the two approaches, I suggest stenciling if at all possible. The availability of a stencil buffer may depend on your particular window manager though.