views:

202

answers:

3

Hello,

Is it possible to split the contents of file into parts that have specific pattern?

This is what I want to achieve:

  • Read the file using file_get_contents
  • Read only contents between similar commented areas.

I am not sure how complicated is that but basically If I am parsing a large html file and want only to display to the browser the specific widgets (pattern is the comment boundaries) like this:

Sample:

<html>
<head>
   <title>test</title>
</head>
<body>
 this content should not be parsed.. ignored
 <!-- widget -->
 this is the widget. i want to parse this content only from the file
 <!-- widget -->
</body>
</html>

would it be possible using php and regex or anything to parse the contents between boundaries only?

I apologize but I tried to explain what I want to achieve as much as I can. hope someone helps me.

+5  A: 

It's certainly possible, but it doesn't really need to be done with regex. I'd probably just do something like this:

$file = file_get_contents('http://example.com/');
$widgets = explode('<!-- widget -->', $file);

Now the odd elements of $widget ([1], [3], [5], etc) contain what was between those boundaries.

Chad Birch
I didn't really think it is that easy! Many thanks really
Ahmad Fouad
+1  A: 

You can achieve what you want with a regular expression (or if you are only ever splitting on you can probably just use that). Check the documentation. The other answer using explode() will probably also work.

$text = file_get_contents('/path/to/your/file');
$array = split('<!-- widget -->', $text);

The first entry will be everything before the first occurrence of <!-- widget --> and the last element will be everything after the last <!-- widget -->. Every odd-numbered element will be what you're looking for.

Php split function documentation

Brett Bender
Very very nice. Quick question..is there any practical difference between explode and split?
Ahmad Fouad
Not really, other than split() supports regular expressions while explode does not. Honestly, if you're not going to use a regular expression, you should probably use explode() as it is likely faster (has to do with php core loading the regular-expression related stuff for split but not explode).
Brett Bender
Ok thanks for the explaination.. yes i will try to avoid regex since its doable with explode no need to make things complicated.
Ahmad Fouad
+1  A: 
$pattern = "/<!-- widget -->([\s\S]+)<!-- widget -->/";
$match = preg_match_all($pattern,$string,$match_array);

var_dump($match_array);
Lance Kidwell