If you have a good estimate of the image background, subtracting it from the image with the person is a good first step. But it is only the first step. After that, you have to segment the image, i.e. you have to partition the image into "background" and "foreground" pixels, with constraints like these:
- in the foreground areas, the average difference from the background image should be high
- in the background areas, the average difference from the background image should be low
- the areas should be smooth. Outline length and curvature should be minimal.
- the borders of the areas should have a high contrast in the source image
If you are mathematically inclined, these constraints can be modeled perfectly with the Mumford-Shah functional. See here for more information.
But you can probably adapt other segmentation algorithms to the problem.
If you want a fast and simple (but not perfect) version, you could try this:
- subtract the two images
- find the largest consecutive "blob" of pixels with a background-foreground difference greater than some threshold. This is the first rough estimate for the "person area" in the foreground image, but the segmentation does not meet the criteria 3 and 4 above.
- Find the outline of the largest blob (EDIT: Note that you don't have to start at the outline. You can also start with a larger polygon, as the steps will automatically shrink it to the optimal position.)
- now go through each point in the outline and smooth the outline. i.e. for each point find the point that minimizes the formula: c1*L - c2*G, where L is the length of the outline polygon if the point were moved here and G is the gradient at the location the point would be moved to, c1/c2 are constants to control the process. Move the point to that position. This has the effect of smoothing the contour polygon in areas of low gradient in the source image, while keeping it tied to high gradients in the source image (i.e. the visible borders of the person). You can try different expressions for L and G, for example, L could take the length and curvature into account, and G could also take the gradient in the background and subtracted images into account.
- you probably will have to re-normalize the outline polygon, i.e. make sure that the points on the outline are spaced regularly. Either that, or make sure that the distances between the points stay regular in the step before. ("Geodesic Snakes")
- repeat the last two steps until convergence
You now have an outline polygon that touches the visible person-background border and continues smoothly where the border is not visible or has low contrast.
Look up "Snakes" (e.g. here) for more information.