views:

368

answers:

5

Hi all!

I'm makin' a scripting language interpreter using PHP. I have this code in that scripting language:

write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly

(Yes, it's hard to believe but that's the syntax)

Which regex must I use to split this (split by spaces) but only if not inside the curly brackets. So I want to turn the above code into this array:

  1. write
  2. Hello, World!
  3. in
  4. either
  5. the
  6. color
  7. blue
  8. or
  9. red
  10. or
  11. #00AA00
  12. and
  13. in
  14. either
  15. the
  16. font
  17. Arial Black
  18. or
  19. Monaco
  20. where
  21. both
  22. the
  23. color
  24. and
  25. font
  26. are
  27. determined
  28. randomly

(The strings inside the curly brackets are show above in bold) The strings inside the curly brackets must be one element each. So {Hello, World!} cannot be: 1. Hello, 2. World!

How can I do this?

Thanks in advance.

A: 

Does the order matter? If not you could extract all {}'s, remove them, then operate on the leftover string.

meder
Yes it does, since {Hello, World!} could also be {This is an English word which is used very often: in} and I can't do a <code>continue;</code> when it sees that there's the word 'in'.
Time Machine
+2  A: 

Hi,

what about using something like this :

$str = 'write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly';

$matches = array();
preg_match_all('#\{.*?\}|[^ ]+#', $str, $matches);

var_dump($matches[0]);

Which will get you :

array
  0 => string 'write' (length=5)
  1 => string '{Hello, World!}' (length=15)
  2 => string 'in' (length=2)
  3 => string 'either' (length=6)
  4 => string 'the' (length=3)
  5 => string 'color' (length=5)
  6 => string '{blue}' (length=6)
  7 => string 'or' (length=2)
  8 => string '{red}' (length=5)
  9 => string 'or' (length=2)
  10 => string '{#00AA00}' (length=9)
  11 => string 'and' (length=3)
  12 => string 'in' (length=2)
  13 => string 'either' (length=6)
  14 => string 'the' (length=3)
  15 => string 'font' (length=4)
  16 => string '{Arial Black}' (length=13)
  17 => string 'or' (length=2)
  18 => string '{Monaco}' (length=8)
  19 => string 'where' (length=5)
  20 => string 'both' (length=4)
  21 => string 'the' (length=3)
  22 => string 'color' (length=5)
  23 => string 'and' (length=3)
  24 => string 'the' (length=3)
  25 => string 'font' (length=4)
  26 => string 'are' (length=3)
  27 => string 'determined' (length=10)
  28 => string 'randomly' (length=8)

The, you just have to iterate over those results ; the ones starting by { and ending by } will be your "important" words, and the others will be the rest.


Edit after the comment : one way to identify the important words would be something like this :

foreach ($matches[0] as $word) {
    $m = array();
    if (preg_match('#^\{(.*)\}$#', $word, $m)) {
        echo '<strong>' . htmlspecialchars($m[1]) . '</strong>';
    } else {
        echo htmlspecialchars($word);
    }
    echo '<br />';
}

Or, like you said, working with strpos and strlen would work too ;-)

Pascal MARTIN
Thanks a thousand it worked! Now I just remove the { and } if they are at position 0 and strlen(arr[a]) - 1
Time Machine
You're welcome :-) Have fun !
Pascal MARTIN
A: 

I would replace them using preg_replace_callback. With the callback you can keep track of the order and replace them with something like %var1%, %var2%, etc.

I don't think that there is a way to explode by spaces, but not in the curly brackets without modifying the string beforehand.

André Hoffmann
A: 

$words = preg_split('/}?\s{?/', 'write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly');

MaxiWheat
Won't this split up the "Hello," and the "World!"?
Simon Nickerson
Oh yes sorry, I messed up on this one
MaxiWheat
A: 

This could be done iterately without regexp. You iterate over the entire string. You put every character in a temporary variable, unless you find a space. When you find a space, you put the content of the temporary variable in the array, empty it, and then continue.

If you find a bracket, you set a boolean, and then put everything in the temp var, until you find a closing bracket. And so on.

<?php
$string = "write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly";
$bracket = false;
$words = array();
$temp = "";

for($i = 0; $i < strlen($string); $i++){    
    $char = $string[$i]
    if($bracket){
     $temp .= $char;
     if($char == "}"){
      $bracket = false;
      $words[] = $temp;
     }
    }
    else{
     if($char == " "){
      if($temp != ""){
       $words[] = $temp;
       $temp = "";
      }
     }
     elseif($char == "{"}{
      $temp .= $char;
      $bracket = true;
     }
     else{
      $temp .= $char;
     }
    }
}
?>

Code is untested.

Ikke