views:

38

answers:

2

I have a string such as the following:

Are you looking for a quality real estate company? 

<s>Josh's real estate firm specializes in helping people find homes from          
[city][State].</s>

<s>Josh's real estate company is a boutique real estate firm serving clients 
locally.</s> 

In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need

I would like to have this paragraph split into an array based on the <s> </s> tags, so I have the following array as the result:

[0] Are you looking for a quality real estate company?
[1] Josh's real estate firm 
    specializes in helping people find homes from [city][State].
[2] Josh's real estate company is a boutique real estate firm serving clients 
    locally.
[3] In [city][state] I am sure you know how difficult it is
    to find a great home, but we work closely with you to give you exactly 
    what you need

This is a regex i'm currently using:

$matches = array();
preg_match_all(":<s>(.*?)</s>:is", $string, $matches);
$result = $matches[1];
print_r($result);

But this one only returns an array containing the text found between <s> </s> tags, it ignores the text found before and after these tags. (In the example above it would only return the array elements 1 and 2.

Any ideas?

+2  A: 

The closest I could get was using preg_split() instead:

$string = <<< STR
Are you looking for a quality real estate company? <s>Josh's real estate firm 
specializes in helping people find homes from [city][State].</s>
<s>Josh's real estate company is a boutique real estate firm serving clients 
locally.</s> In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need
STR;

print_r(preg_split(':</?s>:is', $string));

And got this output:

Array
(
    [0] => Are you looking for a quality real estate company? 
    [1] => Josh's real estate firm 
specializes in helping people find homes from [city][State].
    [2] => 

    [3] => Josh's real estate company is a boutique real estate firm serving clients 
locally.
    [4] =>  In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need
)

Except that produces an extra array element (index 2) where there's a newline between the fragments [city][State].</s> and <s>Josh's real estate company.

It'd be trivial to add some code to remove the whitespace matches though, but I'm not sure if you desire that.

BoltClock
The extra array element is fine, but it seems to be looking for just `</s>`, which means that something like `my name is bob. im 17 </s>.` and `my name is bob. <s>im 17</s>` would both be split into 2 elements, can it be changed so the 1st example is kept in 1 array element only? (I'd like unopened `</s>` to not be matched).
Click Upvote
Also if the empty array elements could be removed, then i'd prefer it.
Click Upvote
I'll fiddle with my code for a bit, then update my answer if I'm able to match only properly opened-and-closed tags, and to remove empty elements.
BoltClock
+1  A: 

I suggest you look into DOM http://php.net/manual/en/book.dom.php

Codler