I went through this a few years back. You can:
Use Word to convert the file into some other format, ASCII, RTF, XML etc.
Use some third-party app to convert to another format, such as ASCII.
Access the Word API through OLE and extract the information directly.
I couldn't find any generic libraries to read Word files, and back then all of the applications that read Word files only worked for a subset. Word changed often enough that they had trouble keeping up.
There were some documents that listed the specifics of the older Word file formats, the underlying file structure is outrageously complicated. Without a lot of resources it would be hard to keep code in sync with the file format.
Initially, I used Perl to drive Word and create new documents, but the solution was too fragile. Later I switch the whole application to work with PDFs instead, and gave up on Word.
Paul.