NTSC/PAL video is formed by an interlaced series of fields (named Odd and Even). In the NTSC format, these fields are displayed to you every 16.66 milliseconds (roughly 60 fields per second), in PAL every 20 milliseconds (50 fields per second). Two fields form a frame.
Frames are displayed every 33 milliseconds in the NTSC format (in PAL every 40 milliseconds), in a succession of Odd/Even fields. Hence a TV displays at 30 FRAMES per second (or fps), or 60 fields per second (Please be kind, all of this was developed with 1940's technology). Your brain merges the succession of images, making them into a fluid video stream.
Since each video image comprises two fields, this is a way to transmit stereo video without doubling the bandwidth required (useful so that you can use the same transmission line as a mono system).
Field sequential video is created when a video multiplexer (either built into the 3D-Video-Camera, but usually an extra piece of equipment) at the capture end injects the Odd field with the information for one eye (say from the left sensor), and the Even field with the information for the other eye (say the right sensor). At the display end, the display proceeds to separate the information for each eye.