If the camera is fixed and there isn't too much motion in the scene, then I would suggest a method based on background subtraction.
Step 1: Compute background for each frame of the video. There are complicated algorithms for doing this, but a very simple and effective one would be to compute the median value of every pixel in the image across a 3 second time window. Longer if the object in question is moving slowly. Incidentally, if you just perform this kind of filtering it will remove most moving objects from the video if the camera is fixed, hence my earlier question about all objects vs. one object.
Step 2: Mark the regions you want to remove in each frame with a brush tool, and replace them with the background pixels. Don't bother with a fine brush or lasso tool as any non-object pixels you mark will just be replaced with their filtered version. You could probably use the same brush marks for several frames since the boundary is not so important. If the object is the only thing moving in the scene, you could just mark the entire frame and have it replaced with the background.
Anyways, to answer your more general question, the topic you want to research is called inpainting for images and video. There is quite a bit of literature out there on the subject, what I described was just a super simple method you could implement in an hour or so with opencv.