views:

244

answers:

3
+1  A: 

Step one: Remove regular expressions from your toolbox when dealing with HTML. You need a parser.

Step two: Download simple_html_dom for php.

Step three: Parse

$html = str_get_html('<SPAN class=placeholder title="" jQuery1262031390171="46">[[[SOMETEXT]]]</SPAN>');
$spanText = $html->find('span', 1)->innerText;

Step four: Profit!

Edit

$html->find('span.placeholder', 1)->tag, $matches); will return what you want. It looks for class=placeholder.

Byron Whitlock
Byron - i don't know ahead of time the title or thejquery###="#" piece - any way to issue wildcards on those?
OneNerd
You said you want to strip the span, not keep the attributes?
LiraNuna
just want the piece [[[SOMETEXT]]] to remain, everything else can go.
OneNerd
I'm also guessing there will be other non/placeholder spans in the source. So you'll need to select only the spans with the placeholder class and get their inner text.
pygorex1
yes, although sometimes the class is set like this: class=placeholder (no quotes), and sometimes with quotes.
OneNerd
+1  A: 

I think this should solve your poble

function strip_placeholder_spans( $in_text ) {
preg_match("/>(.*?)<\//", $in_text, $result);
return $result[1]; }
marvin
hmm - not an expert, but wouldn't that strip out all tags?
OneNerd
oh yes sorry, misunderstood the question, you want only strip span, then you can use,function strip_placeholder_spans( $in_text ) {preg_match("/<span(.*?)>(.*?)<\/span>/", $in_text, $result);return $result[2]; }I'm not sure i understood it right again, im kind of confused waht you wanted
marvin
+1  A: 

Use an HTML parse. This is the most robust solution. The following code will work for the two code examples you posted:

$s= <<<STR
<span style="" class="placeholder" title="">[[[SOMETEXT]]</span>
Some Other text &amp; <b>Html</b>
<SPAN class=placeholder title="" jQuery1262031390171="46">[[[SOMETEXT]]]</SPAN>
STR;

preg_match_all('/\<span[^>]+?class="*placeholder"*[^>]+?>([^<]+)?<\/span>/isU', $s, $m);
var_dump($m);

Using regular expressions results in very focused code. This example will only handle very specific HTML and well-formed HTML. For instance, it won't parse <span class="placeholder">some text < more text</span>. If you have control over the source HTML this may be good enough.

pygorex1
I converted your preg_match_all to a preg_replace, and it appears to do what I need. Thanks -
OneNerd