tags:

views:

214

answers:

3

I am attempting to extract all instances of a particular format from a string:

I am wondering if my new Sony [PT# 123456ABC; Sony] has this feature but my friend says the new Toshiba [PT# AD-3232hjk; Toshiba] has this feature.

I would like to extract:

[PT# 123456ABC; Sony]

[PT# AD-3232hjk; Toshiba]

As you can see here, the only items in the consistent positions are:

  • [PT#
  • ;
  • ]

I was attempting to use various types of strpos() but because of the varying lengths and formats of the part numbers and manufacturer names I was unable to reliably pull out those instances from a much larger string. I have been trying various ways to use regular expressions to solve this however my knowledge with them is fairly limited. After I have these expressions extracted and placed into variables I will then need to separate the part numbers and manufacturer names from the expression. This may also be easier to accomplish using regular expressions.

Any help is appreciated. Thanks

A: 

I think this would do it

preg_match_all( "/(\[PT#\s+.*?;\s+.*?\])/", $input, $matches );

print_r( $matches );

Altternatively, if you just wanted to capture the unique information

preg_match_all( "/\[PT#\s+(.*?);\s+(.*?)\]/", $input, $matches );
Peter Bailey
That works great thanks a lot! Now just to clarify that this is how preg_match_all operates - The output seams to duplicate data.Array( [0] => Array ( [0] => [PT# 123456ABC; Sony] [1] => [PT# AD-3232hjk; Toshiba] ) [1] => Array ( [0] => [PT# 123456ABC; Sony] [1] => [PT# AD-3232hjk; Toshiba] ))
The 0 indexed array has all matches I believe and the 1 based has only what the parenthesis matched IIRC.
alex
A: 

I take it you'll be reading from a text file containing a lot of those entries. What you can do is:

preg_match_all("/\[PT#(.*?);[.*]?(.*?)\]/i", $text, $result);

it will put all matches into the array $result and you can access them as so:

echo $result[1][0]; //echos first occurrence's serial

$result is sorted column major and the first entry into a match is the complete match string

echo $result[0][0]; // would print [PT# 123456ABC; Sony]
echo $result[1][0]; // would print 123456ABC
echo $result[2][0]; // would print Sony

Hope that helps

EDIT: fixed the regex, should work now (still untested)

M_D_K
Unfortunately this outputs a blank array when attempting to print_r $result.
Thats what I get for deviating from my normal regex practices. code should work now.
M_D_K
+1  A: 
$matches = array();
preg_match_all( "/\[PT#([^\];]+);([^\]]+)\]/", $input, $matches,  PREG_SET_ORDER);

foreach ($matches as $match) {
  echo "id=", trim($match[1]), " brand=", trim($match[2]), "\n";
}
Ayman Hourieh
This also works, thanks
\[PT#\s([^;]++);\s([^\]]++)\] : Should be slightly faster, doesn't store backreferenes as they're not needed and it also takes white space into account, you can remove the need for trim then.
The Pixel Developer