ansaurus

Question

Getting Wikipedia Article Summary using NSScanner Problem

Answer 1

+1 A:

You could use WebKit's DOM API to walk the actual structure, rather than trying to parse the text blindly.

Joshua Nozzi 2010-09-22 19:08:20

That's not a good idea because the wiki pages are waaay too inconsistent.

David Schiefer 2010-09-22 19:15:14

First, they're consistent enough that there are a half-dozen apps out there that parse them and present them beautifully on the iPhone and iPad. Second, if using the document's DOM is a bad idea because it's inconsistent, then using NSScanner is at least as bad. At any rate, they look pretty consistent to me. The first p element in the "bodyContent" div. I've spot-checked several articles and they all follow that form. Easy with DOM.

Joshua Nozzi 2010-09-22 19:22:52

David Schiefer: The DOM is a much more reliable way to examine these “inconsistent” pages. Consider that with the DOM, you can get the #toc element *wherever and however* it exists. You simply cannot do that with NSScanner.

Peter Hosey 2010-09-22 21:09:35

ansaurus

tags:

views:

answers:

Getting Wikipedia Article Summary using NSScanner Problem

related questions