You can use the Microsoft HxComp tool (part of the Visual Studio SDK) to decompile as well as compile HxS files. Then I would imagine it would be simply a matter of reading the HTML files in the generated html folder and transforming them into your new format. It seems all other folders generated are to do with styling/formatting. Now the transformation may not be trivial, as the output files are still HTML (and unfortunately not something more straightforward like XML) and contain some script blocks and other things. Your program will need to have an idea of the general structure of the files. Still, I'm not sure quite how much work you want to do with the output HTML, though I don't think you'll be able to get it in any simpler format (if that's what you want).
Finally, your program may need to read the HxC/HxT/HxK files (collection definitions, table of contents, indices, respectively) to figure out the structure of the documentation. Luckily, these are nothing more than plain XML files!
Hopefully this should be enough to get you started...
Edit: You can also see this question/answer, which is quite similar.