tags:

views:

49

answers:

3

Trying to figure out a way to throw out attributes in this data that do not have any values. Thanks for helping.

My current regex code , thanks to Tomalak looks like this

Regex find

([^=|]+)=([^|]+)(?:\||$)

Regex replace

<dt>$1</dt><dd>$2</dd>

Data looks like this

Bristle Material=|Wire Material=Steel|Dia.=4 in|Grit=|Bristle Diam=|Wire Size=0.0095 in|Arbor Diam=|Arbor Thread - TPI or Pitch=1/2 - 3/8 in|No. of Knots=|Face Width=1/2 in|Face Plate Thickness=7/16 in|Trim Length=7/8 in|Stem Diam=|Speed=6000 rpm [Max]|No. of Rows=|Color=|Hub Material=|Structure=|Tool Shape=|Applications=Cleaning rust, scale and dirt, Light Deburring, Edge Blending, Roughening for adhesion, Finish preparation prior to plating or painting|Applicable Materials=|Type=|Used With=Straight Grinders, Bench/Pedestal Grinders, Right Angle Grinders|Packing Type=|Quantity=1 per pack|Wt.=

End result should like this

 <dt>Wire Material</dt><dd>Steel</dd><dt>Dia.</dt><dd>4 in</dd><dt>Wire Size</dt><dd>0.0095 in</dd>

Not this

 Bristle Material=|<dt>Wire Material</dt><dd>Steel</dd><dt>Dia.</dt><dd>4 in</dd>Grit=|Bristle Diam=|<dt>Wire Size</dt><dd>0.0095 in
A: 
([^=|]*)=([^|]*)(?:\||$)

to skip the ones with out a value, try this:

(?:[^=|]*=|([^=|]*)=([^|]+))(?:\||$)
unfortunately that didn't do it.
jeff
are you sure? I just tested it and it worked fine
ah, wait, you want to skip the ones that are missing a value?
correct, thats right
jeff
You don't want to skip ones without a value, you need to replace them with nothing so they no longer are contained in the string.
Doug Neiner
then the second regex should take care of that
My regex editor is saying that this is not valid: (?:[^=|]*=|([^=|]*)=([^|]+))(?:\||$)
jeff
it is, I tested it out
Not sure what the deal is then.
jeff
It is valid, but it is also broken. :) It outputs empty `<dt>` and `<dd>` elements instead of not outputting anything for a key without a value.
Doug Neiner
you are right, it would generate empty tags, my bad
+2  A: 

Here is how you can do it in PHP without using regular expressions:

$parts_list = explode('|', "Bristle Material=|Wire M....");
$parts      = "";

foreach( $parts_list as $part ){
 $p = explode('=', $part);
 if(!empty($p[1])) $parts .= "<dt>$p[0]</dt>\n<dd>$p[1]</dd>\n";
}

echo $parts;

And here is how you can do it with regular expressions:

$parts = preg_replace( 
    array('/([^=|]*)=(?:\||$)/','/([^=|]*)=([^|]+)(?:\||$)/'),
    array('', '<dt>$1</dt><dd>$2</dd>'),
    $inputString 
);

echo $parts;

Update

This is using a special replace feature of the PHP preg_replace which takes an array of regex expressions, and an array of replacement strings. The array() syntax of the function basically equates to this:

If I can match this: /([^=|]*)=(?:\||$)/ then replace it with an empty string.
If I can match this: /([^=|]*)=([^|]+)(?:\||$)/ then replace it with <dt>$1</dt><dd>$2</dd>

To test it in a Regex editor, you would run the first expression first (/([^=|]*)=(?:\||$)/) then run the second expression on the result of the first expression.

Doug Neiner
Tried pasting this into my regex editor. For some reason its not working. /([^=|]*)=(?:\||$)/','/([^=|]*)=([^|]+)(?:\||$)/
jeff
Jeff, since you are using PHP (as you mentioned in the other post), just use it in PHP. A simple regex will not do what you want, because you need to *remove* an element without a value, but *replace* elements that do have values. The code I have given you I tested in PHP with your input string and it works. There are two separate regexes in their (using the php `array` to use them both): First: `/([^=|]*)=(?:\||$)/` Second: `/([^=|]*)=([^|]+)(?:\||$)/`
Doug Neiner
Jeff, I updated my answer so you could better see what was going on.
Doug Neiner
Thank you for helping. I am using PHP for the application, but made the assumption that I could just do a simple replacement in a text editor. The source data contains thousands of records in one column of a flat file.
jeff
Jeff, here is a trick that might help you. Setup a php file with a line that has `$inputString = "...pastelargefilehere...";` and my second solution above. Then from the command line (on a Mac or Linux box this will work, not sure how it works on a PC) run `php -f /path/to/file.php > /path/to/processes.txt` and it will parse the text for you and create a text file with the response.
Doug Neiner
Sounds like it may work, but what about the other data in the file. Its a CSV file. I am only trying to change data in one column. Won't it cause issue with the rest of the data?
jeff
Then just copy the one line out for processing, run it through the php file, and paste the results back in place of the original line.
Doug Neiner
I'll give that a whirl in the morning. Thanks again for your help
jeff
Thanks again for all your help. I took a different approach. I used your php code to change the value on the front end of the site as the product is being displayed instead of transforming the data in the import file. I don't know why I didn't think of this before
jeff
A: 

looks like you want preg_match here rather than preg_replace

 preg_match_all('~([^|]+)=([^|\s][^|]*)~', $str, $matches, PREG_SET_ORDER);
 foreach($matches as $match)
      echo "<dt>{$match[1]}</dt><dd>{$match[2]}</dd>\n";
stereofrog