how to open a password protected Microsoft word(.doc, .docx) file in java assuming that the password is known?
Use a suitable library. A good starting point is the OpenOffice API
You can try it with com4j.
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word.documents.open2000.aspx
Since there is a parameter called "PasswordDocument" in the "open"-method, I think it is possible to open a password protected file.
Hope this is what you were searching for ;)
Edit: I recorded this Macro in Word.
Documents.Open FileName:="test.doc", ConfirmConversions:= _
False, ReadOnly:=False, AddToRecentFiles:=False, PasswordDocument:= _
"hallo", PasswordTemplate:="", Revert:=False, WritePasswordDocument:= _
"hallo", WritePasswordTemplate:="", Format:=wdOpenFormatAuto
So the open method in com4j should look somethin like this (password is "Hallo"):
_Document document = app.documents().open2000(doc, false, false, false, "hallo", "", false, "hallo", "", WdOpenFormat.wdOpenFormatAuto, false, true);
A good starting point would be the Apache POI project which supports Office 97-2003 and OOXML (2007-2010) formats. If you are mainly interested in extracting text from those files, you should also look at the Tika project that has some good code, such as OfficeParser.java
You will want to substitute in your known password(s) around line 220 in the parse() method:
if (!d.verifyPassword(Decryptor.DEFAULT_PASSWORD)) {
throw new TikaException("Unable to process: document is encrypted");
}
-- the default password is set to the mostly useless password "VelvetSweatshop" (!)