I'm using this library: http://benreeves.co.uk/objective-c-hmtl-parser/ to parse HTML for a little iPhone app I'm making. I have got the code working so far, but it fails when presented with an accent (so far only experienced é). This is the code I'm using:
NSError * error = nil;
HTMLParser * parser = [[HTMLParser alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://intranet.westminster.org.uk/almanack/food.asp?nextweek=TRUE"] error:&error];
if (error) {
NSLog(@"Error: %@", error);
return nil;
}
HTMLNode * bodyNode = [parser body]; //Find the body tag
NSArray *individualMeals = [bodyNode findChildTags:@"font"];
for (HTMLNode *node in individualMeals) {
if ([[node getAttributeNamed:@"color"] isEqual:@"green"]) {
NSLog(@"%@",[node rawContents]);
}
}
But it doesn't parse all of the text. It seems to give up after it finds an accent in the URL. This is the result it produces when run:
2010-10-07 18:40:59.296 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.298 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.305 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.307 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.308 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Hash Brown <br/>Baked Beans <br/>Breakfast special <br/>Three cheese omelets <br/><br/><br/>Plain Porridge <br/><br/><br/><br/>Croissants <br/><br/> Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.309 Westminster[1011:207] <font color="green">Mulligatawny <br/>Black Olive <br/>RICE <br/>Roasted med veg in paella rice <br/>Hot and sticky wings on yellow rice <br/>Hoi Sin Pork Belly Steaks <br/>Vegetable Biriyani with a Mild Curry Sauce <br/>Babycorn Bamboo Shoots and Water Chestnuts <br/>Stir fried noodles with seaweed <br/>Lemon Sponge with Orange Sauce <br/>Vanilla Granola</font>
2010-10-07 18:40:59.310 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.312 Westminster[1011:207] <font color="green">Pea & Ham <br/><br/>Black Olive <br/>Roast Chicken with Bread Sauce and Roast Jus <br/>Warm Salad of Salmon and Crispy Bacon <br/><br/><br/>Vegetarian Chilli <br/>With Sour Cream and Braised Rice <br/>Green Beans <br/><br/>Bubble & Squeak <br/><br/>Tiramisu <br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.313 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Grilled Tomato <br/>Grilled mushrooms <br/>Fried Egg <br/><br/><br/><br/>Plain Porridge <br/><br/><br/><br/>Bread <br/><br/>Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.317 Westminster[1011:207] <font color="green">Root Vegetable <br/>Red Pesto <br/>WRAP <br/>Chimichanga <br/>Mexican fish tortillas <br/>Roast Leg of Lamb <br/>Gnocchi with Roasted Vegetables and Flaked Parmesan <br/>Broccoli <br/><br/><br/>Thyme Roasted Potatoes <br/> Sticky Toffee Pudding and Toffee Sauce <br/>Banana Bread</font>
2010-10-07 18:40:59.318 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.318 Westminster[1011:207] <font color="green">Tomato with Basil Oil <br/>Red Pesto <br/>Beef Olives <br/><br/>Lamb with Ginger, Spring onion and Noodles <br/><br/><br/>Field Mushroom Pies <br/>Ratatouille <br/><br/>Creamed Potatoes <br/><br/>Lemon Tart <br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.319 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Baked Beans <br/>Grilled Tomato <br/>Breakfast special <br/>Avocado on toast <br/><br/>Plain Porridge <br/><br/><br/>Bread and banana bread <br/><br/>Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.333 Westminster[1011:207] <font color="green">(GREEK) <br/><br/>FLAT BREADS <br/>SPINACH, ROCKET AND FETA AND TOASTED SOUR DOUGHS <br/>SEAFOOD STUFFED PEPPERS <br/>STIFADO (beef) <br/><br/>LAMB FRICASSEE <br/>zucchini pie from Macedonia <br/>RICE <br/><br/>GIGANTIS PLAKI <br/><br/>ORANGE AND LEMON CAKE TOPPED WITH GREEK YOGURT AND HONEY</font>
2010-10-07 18:40:59.333 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.334 Westminster[1011:207] <font color="green">Roasted Vegetable <br/>FLAT BREADS <br/>Pork Steak Served with a Tomato, Tarragon and Mushroom sauce <br/>Roast beef and homemade horseradish sauce <br/><br/><br/>Lancashire Cheese Sausages with Onion Gravy <br/>Courgettes <br/><br/>Roast Potatoes <br/><br/>Mississippi Mud Pie <br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.343 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Hash Brown <br/>Grilled mushrooms <br/>Fried Egg <br/><br/><br/><br/>Plain Porridge <br/><br/><br/><br/>Bread <br/><br/> Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.344 Westminster[1011:207] <font color="green">Leek, Blue Cheese and Potato <br/>Sunflower Seed <br/>COUS COUS <br/>Couscous with apricots, lemon and coriander <br/><br/>Couscous fried chicken with couscous and spiced tomato sauce <br/>Butchers Sausages <br/>Balsamic Roasted Vegetable Frittata <br/>Red Cabbage <br/><br/><br/>Mashed Potatoes <br/><br/>Jam Roly Poly <br/>Bakewell Slice</font>
2010-10-07 18:40:59.344 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.345 Westminster[1011:207] <font color="green">Curried Parsnip and Apple <br/>Sunflower Seed <br/>Spiced Sticky chicken pieces <br/>Mexican Beef Chilli Wraps with Natural Yogurt and Guacamole <br/><br/><br/>Roasted Teriyaki Tofu Steaks with Glazed Green Vegetables <br/>Spiced Aubergine <br/><br/>Rice and Peas <br/><br/>Mango Mousse <br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.351 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Baked Beans <br/>Grilled Tomato <br/>Breakfast special <br/>Muffin bar <br/><br/>Plain Porridge <br/><br/><br/><br/>Croissants <br/><br/>Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.352 Westminster[1011:207] <font color="green">Carrot and Chilli <br/>Rosemary <br/>NOODLES <br/><br/>Crispy tofu <br/>Lemon chicken <br/>Fish with Traditional Crispy Batter <br/>Japanese Vegetable Curry with Rice Noodles and Tofu <br/>Garden peas <br/><br/><br/>Chips <br/>Viennese Jam Tart and Custard <br/>Fresh Fruit Salad</font>
2010-10-07 18:40:59.361 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.361 Westminster[1011:207] <font color="green">Three onion, spring, red and white <br/>Rosemary <br/>Pepperoni Pizza Topped with Boccaccio <br/>Bolognaise pasta bake <br/><br/>Vegetarian Plait <br/>Green Cabbage <br/><br/>Oven Baked Cajun Wedges <br/><br/>Ice <br/>Cream Sundae <br/><br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.362 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Hash Brown <br/>Grilled Mushrooms <br/>Poached Eggs <br/><br/><br/><br/>Plain Porridge <br/><br/><br/><br/><br/><br/>Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.362 Westminster[1011:207] (null)
2010-10-07 18:40:59.363 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.363 Westminster[1011:207] <font color="green"/>
It gives up at the section with sautéed potatoes, and doesn't return any results from that or any of the latter sections.
I think this might be due to the website not having encoded the és. When I view the source I see é rather than & eacute; (without space, otherwise SO formats it...) as suggested by this website: http://www.w3.org/MarkUp/html3/latin1.html
Thanks for your time. If you know a better way to obtain whats for lunch from that website I would love to hear it too.