I have a question about parsing HTML pages, specificaly forums, i want to parse a forum or thread containing certain post criterias, i havent defined the algorithm yet, since i have only parsed structure text formats before, A use case may be copy and paste each thread into the program by hand, or insert a URL like http://www.forums.com/forum/showthread.php?t=46875&page=3 and let the program parse the pages
Given all this i would like to know:
- Is it possible to parse a forum thread on a HTML page?
- what would be the best/Fastest/easiest language for doing this?
- If i prefer Java what tools/libraries do i need for this?
- Any other thing i should consider?