I have a long string of HTML that contains
<p>
<img>
<span>
and a bunch of other tags.
Is there anyway of extracting ONLY the text within the tags from this string?
I have a long string of HTML that contains
<p>
<img>
<span>
and a bunch of other tags.
Is there anyway of extracting ONLY the text within the tags from this string?
If you want to extract all text within any tags, the simple way is to strip the tags: strip_tags()
If you want to remove specific tags, maybe this SO questions helps.
I know I'll be getting a lot of bashing for this, but for a simple task like this I'd use regular expressions.
preg_match_all('~(<span>(.*?)</span>)~', $html, $matches);
$matches[0]
will contain all the span tags and their contents, $matches[1]
contains only the contents.
For more complicated stuff you might want to take a look at PHP Simple HTML DOM Parser or similar:
// Create DOM from URL or file
$html = str_get_html($html);
// Find all images
foreach($html->find('img') as $element) {
echo $element->src . '<br>';
}
Etc.
Regular Expression is the only solution i can think of.
You will find your answer here http://www.catswhocode.com/blog/15-php-regular-expressions-for-web-developers