tags:

views:

53

answers:

2

I am writing an application to get the title of an html page, some text under the body tag and an image. It is something like the share stuff of facebook. I can get a regular expression that does that. Thanks for your assitance.

+1  A: 

A regexp like <title>(.*?)</title> will get you the content of title. The .*? part is for matching any characters, in a non greedy way (in case there is another title end tag in the page).

Scharron
Pls hw do i go about this, new to regular expressions
Thanks I got it
+2  A: 

You should probably use a HTML Parser instead of Regular Expression. See Simple HTML DOM, for example.

A regular expression for your task will be very hard to maintain and will break easily on any changes of the pages in question, not to mention that you cannot account for HTML comments.

Jens
+1 but I'm sure you can remove HTML comments with Regex.
Chris Diver
I have never used Simple HTML DOM before, how to I go about it.I'm I suppose to install any stuff ...