First off, I think it's worth considering what your real requirement is. I suspect that it's not just "store the active points and none of the others in as space-efficient a manner as possible", but also a certain amount of "store adjacent points in nearby memory locations so that you get good caching behavior" and "store points in a manner for which lookups can be done efficiently".
With that said, here's what I would suggest. Divide the full 3D region into cubical blocks, all the same size. For each block, store all of the points in the block in dense arrays, including a boolean isTissue array for whether each point is in the tissue region or not. Allocate only the blocks that have points in them. Make a (dense) array of pointers to blocks, with NULL pointers for non-allocated blocks.
Thus, to find the point at (i,j), you first compute ii=i/blockside, jj=j/blocksize, and then look in the pointer-to-block table at (ii,jj) to find the block that contains your point. If that pointer is NULL, your point isn't in the tissue. If it's non-null, you look at (i mod blocksize, j mod blocksize) in that block, and there is your (i,j) point. You can check its isTissue flag to see if it's a "present" point or not.
You'll want to choose the block size as a balance between minimizing the number of times you do adjacent-point computations that cross block boundaries, and minimizing the number of points that are in blocks but not in the tissue region. I'd guess that at a minimum you want a row of the block to be a cache-line long. Probably the optimum is rather larger than that, though it will depend at least somewhat on your geometry.
To iterate over all the points in a 3D box, you would either just do lookups for each point, or (more efficiently) figure out which blocks the box touches, and iterate over the regions in those blocks that are within your box, skipping the ones where isTissue is false.
If you're doing a lot of deallocation and re-allocation of blocks, you probably want to "deallocate" blocks by dropping them into an "unused" pool, and then pulling blocks out of that pool rather than reallocating them. This also has the advantage that those blocks will already have all of their points set to "not present" (because that's why you deallocated the block), so you don't need to initialize them.
The experienced reader will probably recognize similarities between this and ways of laying out data for parallel computations; if you have a really big simulation, you can easily distribute the blocks across multiple nodes, and you only need to do cross-node communication for the cross-block computations. For that sort of application, you may find it useful to do nested levels of blocks, where you have meta-blocks (for the cross-node communication) containing smaller blocks (for the geometry).