How does for example components like the "Total Commander " search can open every file format And search inside it ? Is there free library that offer me such feature ? Basically in the end I will like to extract texts from files be able to support all formats ( pdf,Microsoft doc ,chm …)
The programs that seem to do so, actually don't. They delegate the task to extractors installed on your system. If you do not have an extractor for the .foo
file format, no program will be able.
This is of course no surprise, when you realize that there's no way another program can know how I stored text in .MyOwnFormat
files.
I believe actually, Total Commander treats all these files as plain text (maybe with some codepage guessing or simply trying all codepages). For example if you look closely into .doc file as plain text file, you'll find it's text among binary data which is suffice for searching. Oh, and some kind of archiver detection routine is almost certanly used, because MS Office 2007 and OpenOffice use ZIP for compressing it's files and it's useless to search text in compressed file without unpacking it.