BTW, you should specify the language (English, Arabic...) in which you wish to build this dataset, as this affect both the selection of book sources and the conversion utilities.
Identifiying data content sources:
Interestingly, and for all the [interactive] online Hadeeth Search tools such as the one on the
CRCC's Compendium of Muslim Texts site (original from MSA West, but somehow not available/working at MSA site anymore), there doesn't seem to be any download-able version of the underlying databases!
There are several online versions of books themselves, in particular the popular ones you mention, but you would then need to parse and index them properly in order to retain the references etc. Also, going "back" to the books, you would have to relate them yourself.
With regards to converting CHM files...
There's no open source or freeware program that I'm aware of, but the shareware ABC Amber CHM converter (c. $25.00) appears to be the gold standard for that purpose.
I only had passing exposure to this software a couple of years ago, for a one-time conversion job similar to the one you are contemplating. The Amber converter "did the trick"; Luckily the underlying structure of the help pages exposed much regularity which allowed a relatively straight forward tabulation into CSV/database fields.
ABC Amber converter supports many languages, including Arabic (but I used it for English only).