views:

1382

answers:

4

I was wondering if anyone had a file format reference for Help 2.0 HxS files?

I've seen a few 3rd-party viewers so obviously someone has gone through the trouble of reverse-engineering the file format, but I have been unable to track anything down with Google.

I am interested in creating a better interface to the MSDN documentation and just want access to the HTML.

Any ideas?

+11  A: 

This link contains information about the internal format of the files.

Mitch Wheat
+3  A: 

You can use the Microsoft HxComp tool (part of the Visual Studio SDK) to decompile as well as compile HxS files. Then I would imagine it would be simply a matter of reading the HTML files in the generated html folder and transforming them into your new format. It seems all other folders generated are to do with styling/formatting. Now the transformation may not be trivial, as the output files are still HTML (and unfortunately not something more straightforward like XML) and contain some script blocks and other things. Your program will need to have an idea of the general structure of the files. Still, I'm not sure quite how much work you want to do with the output HTML, though I don't think you'll be able to get it in any simpler format (if that's what you want).

Finally, your program may need to read the HxC/HxT/HxK files (collection definitions, table of contents, indices, respectively) to figure out the structure of the documentation. Luckily, these are nothing more than plain XML files!

Hopefully this should be enough to get you started...

Edit: You can also see this question/answer, which is quite similar.

Noldorin
I could be wrong, but doesn't the HxComp tool compile/decompile only collection definition (.HxC) files (and not HTML)?
Mitch Wheat
It does of course produce HxC/HxT/HxK files, but it also generates several directories for the HTML files, styles, scripts, etc. I've just edited my answer to reflect this.
Noldorin
+1  A: 

Based on the answers so far, it looks like all the information you need to reverse those help files is here. I would offer more help, or advice, but it looks like you got it covered already... Getting to the HTML isn't the hard part. It seems that being able to find what you want, and make sense of all mounds of information contained in those files is going to be the hard part.

If you still need help with, let me know. I've done a lot of work in various Microsoft compression/archive formats...

LarryF
Thank you for the offer it is much appreciated. As you said, it would seem that I have everything that I need here. I'm quite comfortable scraping the HTML for the actual content that I want, so that is not a big deal.
Frank Krueger
Ok, good deal. Have you got all the other pieces in place already? If not, I have some CHM handler code, and I've done my share of PE file parsing, so I could get you to the CHM data pretty easy, then it's just the matter of getting it out.
LarryF
A: 

Try the MSDN web services. http://msdn.microsoft.com/en-us/magazine/cc163541.aspx#S1

Gena