views:

77

answers:

4

Hello everyone,

I have a program that will scan the contents of a website, and display it in a textbox. The problem is that it shows the html source. For example if my html code was:

<html>
<body>

<p>Hello</p>

</body>
</html>

instead of just showing hello,

it'll show the code above...

How can I get my objective c program to just read the hello, and not the html source.. I was assuming that it was the encoding when reading the website, but I might be possibly wrong..

I would greatly appreciate it if someone could give me a reasonable answer..

Best Regards,

Kevin

+1  A: 

As far as I know there is nothing built into cocoa to do this. You would have to implement your own HTML parser to read the code and spit out text. I would do this by either searching for other implementation online and adapting them for cocoa as it would give you lots of experience with the language or you could trial and error and learn some regular expressions. This particular library is for Java, but it should be an easy port to cocoa/c http://htmlparser.sourceforge.net/

Apparently you can 'tidy up' the html and then use an XML parser http://tidy.sourceforge.net/ There is however an XML parser(HTML is a subset) and you could use it to get the information that you want from it. http://expatobjc.sourceforge.net/

Shadow
A: 

If it twas me, I would write a script on a web server in say, php, that handles parsing out the text in a web page. php has a bunch of built-in functions like strip_tags() that handle removing html tags from a string.

So all the heavy lifting would be done in the php script. Then your iPhone app (assuming it's for iphone per your tags) will just POST the URL you want to parse to your php script, which then returns the text to you.

Banjer
That was actually what I was thinking, but lets say that I have a html script like the one above. How would I save the result in a php script??
Kevin
You shouldn't need to save anything. Your iphone app could do an NSMutableURLRequest to your php script at say, http://yoursite.com/gettext.php. Your php script reads in a web page and parses out the text, then you'll "echo" the resulting text. The echo is the response thats sent to back to NSMutableURLRequest in your app, which you'll then put in the textbox.Also, NSMutableURLRequest could pass the URL (of the web page you want to parse) to your php script. I can post some sample code if you need it.
Banjer
A: 

Just use regex to strip the tags, do a google search you can find the answer

owen
+2  A: 

If you want to display a web page, use WebKit. If you want to strip xml tags, use NSXMLParser. Some html is valid xml, but it depends. HTML is just text unless you use something designed to parse it.

drawnonward