Hi,
I have this java source code which one of my friend gave me and unfortunately he is away from the country to fix my issue.This code,scrapes the data from the websites and fills the database it self.
But previously due to some layout changes in the website,it is not functioning properly.So i am giving a try to recompile it using eclipse.But it says this HTML Parser Cannot be a Resolved Type and there is no HTML Parser Class in the source code that my friend gave to me.
I googled about the HTML Parser and it seems to be an open source library.But i don't know how i make it happen to compile the code integrating it.can some one help me on,how do i use HTML Parser to make my source code compile.
I tried downloading their BIN files.But don't know what to do with them.
Here is the some part of the code,where it uses the HTML Parser and gives me the error.
parser = new HTMLParser("file:///"+myProp.getPropertyPageLink());
Thanks for anyone's interest to help me out.I have stated only one function that uses HTMLParser class.
public Property parsePropertyPage(Property myProp){
myProp.setAgentId(this.agentId);
int count = 0;
String description = "No Description Available";
String content = "No Content Available";
predicatesFilter = new NodeFilter[2];
predicatesFilter[0] =
new NodeClassFilter(org.htmlparser.tags.Div.class);
predicatesFilter[1] =
new NodeClassFilter(org.htmlparser.tags.Span.class);
filtersHolder = new OrFilter(predicatesFilter);
linkTag = new LinkTag();
div = new Div();
sp = new Span();
filter = new NodeClassFilter(org.htmlparser.tags.Div.class);
try {
System.out.println("file:///"+myProp.getPropertyPageLink());
parser = new HTMLParser("file:///"+myProp.getPropertyPageLink());
NodeList myList = parser.extractAllNodesThatMatch(filtersHolder);
System.out.println("Relevant Tags : " + myList.size());
for (int i = 0; i < myList.size(); i++) {
//System.out.println(myList.elementAt(i));
if (myList.elementAt(i).getClass().equals(div.getClass())) {
String temp = ((Div) myList.elementAt(i)).getText();
if (temp.indexOf("div id=\"agentCollapsed\"")==0) {
System.out.println("Process Agent");
this.processAgent(myList.elementAt(i), myProp);
}else if ("div id=\"majorResultsNav\"".equalsIgnoreCase(temp)) {
System.out.println("Process Major Results");
Node n = myList.elementAt(i);
n = n.getFirstChild().getNextSibling();
n = n.getFirstChild().getNextSibling();
n = n.getFirstChild().getNextSibling();
n = n.getFirstChild().getNextSibling();
String s = n.toPlainTextString();
if (s.indexOf("for Rent") > 1)
myProp.setIsRental(true);
if (s.indexOf("for Sale") > 1)
myProp.setIsSales(true);
if (s.indexOf("Sold") > 1)
myProp.setIsSold(true);
s = s.substring(s.indexOf("-") + 1);
myProp.setState(s.trim());
} else if ("div class=\"header\""
.equalsIgnoreCase(
myList.elementAt(i).getText())){
processHeader(myList.elementAt(i), myProp);
System.out.println("Process Header");
} else if (
myList.elementAt(i).getText().startsWith(
"div class=\"textual")){
processTextual(myList.elementAt(i), myProp);
System.out.println("Process Textual");
} else if (
"div id=\"propertyLocation\""
.equalsIgnoreCase(
myList.elementAt(i).getText())){
myProp.setPropertyLocation(
myList.elementAt(i).toHtml());
System.out.println("Process Property Location");
} else if (
myList.elementAt(i).getText().startsWith(
"div class=\"minorImage")) {
count++;
myProp.setNumberOfMinorImages(count);
System.out.println("Process Minor Image");
}else if (
myList.elementAt(i).getText().startsWith(
"div id=\"inspectionTimes")){
processInspection(myList.elementAt(i), myProp);
System.out.println("Process Inspection Times");
}
} else if (
myList.elementAt(i).getClass().equals(sp.getClass()))
if ("span class=\"lg-dppl-bold\""
.equalsIgnoreCase(
((Span) myList.elementAt(i)).getText()))
myProp.setPrice(
((Span) myList.elementAt(i)).getStringText());
else if (
"span class=\"lg-mag-bold\"".equalsIgnoreCase(
((Span) myList.elementAt(i)).getText()))
myProp.setIsSold(true);
}
} catch (Exception e) {
e.printStackTrace();
}
return myProp;
}