Don't focus too much on the individual pixels. Focus on the corners of the pixels - the points where four pixels meet. The co-ordinates of these corners will work a lot like half-open co-ordinates of the pixels. Half-open bounds are inclusive at the lower bound, but exclusive at the upper bound, so the half-open range from one to three is {1, 2}.
Define a set of edges - single-pixel-long lines (vertical or horizontal) between two pixels. Then form an adjacency graph - two edges are adjacent if they share a point.
Next, identify connected sets of edges - the subgraphs where every point is connected, directly or indirectly, to every other. Logically, most connected subgraphs should form closed loops, and you should ensure that ALL loops are considered simple, closed loops.
One issue is the edge of your bitmap. It may simplify things if you imagine that your bitmap is a small part of an infinite bitmap, where every out-of-bounds pixel has value 0. Include pixel-edges on the edge of the bitmap based on that rule. That should ensure all loops are closed.
Also, consider those pixel-corners where you have four boundary edges - ie where the pixel pattern is one of...
1 0 0 1
0 1 1 0
The these cases, the '1' pixels should be considered part of separate polygons (otherwise you'll end up with winding-rule complications). Tweak the rules for adjacency in these cases so that you basically get two connected (right-angle) edges that happen to touch at a point, but aren't considered adjacent. They could still be connected, but only indirectly, via other edges.
Also, use additional arrays of flags to identify pixel-edges that have already been used in loops - perhaps one for horizontal edges, one for vertical. This should make it easier to find all loops without repeatedly evaluating any - you scan your main bitmap and when you spot a relevant edge, you check these arrays before scanning around to find the whole loop. As you scan around a loop, you set the relevant flags (or loop ID numbers) in these arrays.
BTW - don't think of these steps as all being literal build-this-data-structure steps, but more as layers of abstraction. The key point is to realise that you are doing graph operations. You might even use an adaptor that references your bitmap, but provides an interface suitable for some graph algorithm to use directly, as if it were using a specialised graph data structure.
To test whether a particular loop is a hole or not, pick (for example) a leftmost vertical pixel edge (on the loop). If the pixel to the right is filled, the loop is a polygon boundary. If the pixel to the left is filled, the loop is a hole. Note - a good test pixel is probably the one you found when you found the first pixel-edge, prior to tracing around the loop. This may not be the leftmost such edge, but it will be the leftmost/uppermost in that scan line.
The final issue is simplifying - spotting where pixel-edges run together into straight lines. That can probably be built into the scan that identifies a loop in the first place, but it is probably best to find a corner before starting the scan proper. That done, each line is identified using a while-I-can-continue-tracing-the-same-direction loop - but watch for those two-polygons-touching-at-a-corner issues.
Trying to combine all this stuff may sound like a complicated mess, but the trick is to separate issues into different classes/functions, e.g. using classes that provide abstracted views of underlying layers such as your bitmap. Don't worry too much about call overheads or optimisation - inline and other compiler optimisations should handle that.
What would be really interesting is the more intelligent tracing that some vector graphics programs have been doing for a while now, where you can get diagonals, curves etc.