tags:

views:

385

answers:

4

Hi guys,

Is there any way to extract hyperlinks from .doc. I got bunch of hyperlinks in doc that I need to import in my database.

I have tried converting doc to HTML, but hyperlinks are not transferred.

Regardz, Mladen

+2  A: 

We had a similar issue and ended up using a third party component called Aspose.Words. You can find it here: http://www.aspose.com

It's available for .NET and Java.

Sonny Boy
Wow, I guess this must be new. A few years ago I searched and searched for a solution like this that didn't require Microsoft Office to be installed. But I couldn't find anything so I had to use Office Automation. I guess it's a little pricey, but I'd much prefer to use a component like this.
Steve Wortham
I can vouch for Aspose.Words. It's saved us hundreds of hours of development and has allowed us to dynamically create word documents above and beyond what can be done with simple mail merges. We also use it to strip all the text out of Word docs for indexing. I highly recommend that product if you have to work with a lot of MS Word docs. It also handles RTF which is a bonus.
Sonny Boy
A: 

You could try importing the file into OpenOffice and see whether hyperlinks are transferred. OpenDocument is just a ZIP file with XML inside, very easy to parse once you've got the hang of it.

Pekka
A: 

I have done the following thing. I have opened the .doc file with officeXP, then published it as a blog and after that I have saved that blog in the form of filtered web page. That gives you nice HTML which you can parse with ease.

Mladen
A: 

I realise this is some months after your initial question, however, You can also extract hyperlinks in a .doc file through through Word Automation. There are hyperlink objects in the API that you can easily extract.

Richard