Hi I am familier with java programming language I like to extract the data from a website and store it to my database running on my machine.Is that possible in java.If so which API I should use. For example the are number of schools listed on a website How can I extract that data and store it to my database using java.
What you're referring to is commonly called 'screenscraping'. There are a variety of ways to do this in Java, however, I prefer HtmlUnit. While it was designed as a way to test web functionality, you can use it to hit a remote webpage, and parse it out.
I would recommend using a good error handling html parser like Tagsoup to extract from the HTML exactly what you're looking for.
Depending on what you are really trying to do, you can use many different solutions.
If you juste wanna fetch the HTML code of a web page, then URL.getContent() may be your solution. Here is a little tutorial :
http://www.javacoffeebreak.com/books/extracts/javanotesv3/c10/s4.html
EDIT : didn't understand he was searching for a way to parse the HTML code. Some tools have been suggested above. Sorry for that.
You definitely need a good parser like NekoHTML.
Here's an example of using NekoHTML, albeit using Groovy (a Java-based scripting language) rather than Java itself:
http://www.keplarllp.com/blog/2010/01/better-competitive-intelligence-through-scraping-with-groovy
You can use VietSpider XML from
http://sourceforge.net/projects/binhgiang/files/
Download VietSpider3_16_XML_Windows.zip or VietSpider3_16_XML_Linux.zip
VietSpider Web Data Extractor: Software crawls the data from the websites ((Data Scraper)), format to XML standard (Text, CDATA) then store in the relational database. Product supports the various of RDBMs such as Oracle, MySQL, SQL Server, H2, HSQL, Apache Derby, Postgres …VietSpider Crawler supports session (login, query by form input), multi-downloading, JavaScript handling, proxy (and multi-proxy by auto scan the proxies from website)…