I have 100 Gb of documents. I would like to characterize it and get a general sense of what topics are prevalent.
The documents are plain text.
I have considered using a tool like Google Desktop to search, but it is too large to really guess what to search ask for and too time consuming to perform enough searches to cover the entire set.
Are there any freely available tools that will cluster a large dataset of documents?
Are there any such tools that can visualize such clusters?