The accelerometers will be registering a constant acceleration due to gravity, plus any acceleration the device is subjected to by the user, plus noise.
You will need to low pass filter the samples to get rid of as much irrelevant noise as you can. The worst of the noise will generally be higher frequency than any possible human-induced acceleration.
Realise that when the device is not being accelerated by the user, the only force is due to gravity, and therefore you can deduce its attitude in space. Moreover, when the total acceleration varies greatly from 1g, it must be due to the user accelerating the device; by subtracting last known estimate of gravity, you can roughly estimate in what direction and by how much the user is accelerating the device, and so obtain data you can begin to match against a list of known gestures.
With a single three-axis accelerometer you can detect the current pitch and roll, and also acceleration of the device in a straight line. Integrating acceleration minus gravity will give you an estimate of current velocity, but the estimate will rapidly drift away from reality due to noise; you will have to make assumptions about the user's behaviour before / between / during gestures, and guide them through your UI, to provide points where the device is not being accelerated and you can reset your estimates and reliably estimate the direction of gravity. Integrating again to find position is unlikely to provide usable results over any useful length of time at all.
If you have two three-axis accelerometers some distance apart, or one and some gyros, you can also detect rotation of the device (by comparing the acceleration vectors, or from the gyros directly); integrating angular momentum over a couple of seconds will give you an estimate of current yaw relative to that when you started integrating, but again this will drift out of true rapidly.