tags:

views:

170

answers:

3

I'm writing a trimming function that takes a string and finds the first newline \n character after the 500th character and returns a string up to the newline. Basically, if there are \n at indices of 200, 400, and 600, I want the function to return the first 600 characters of the string (not including the \n).

I tried:

$output = preg_replace('/([^%]{500}[^\n]+?)[^%]*/','$1',$output);

I used the percent sign because I couldn't find a character class that just encompassed "everthing". Dot didn't do it because it excluded newlines. Unfortunately, my function fails miserably. Any help or guidance would be appreciated.

+1  A: 

You can add the s (DOTALL) modifier to make . match newlines, then just make the second bit ungreedy. I've also made it match everything if the string is under 500 characters and anchored it to the start:

preg_match('/^.{500}[^\n]+|^.{0,500}$/s', $output, $matches);
$output = $matches[0];
Greg
Hope this makes sense... brain is fried...
Greg
fantastic. I was also leaning towards preg_match and came up with:`$matches = false;` `preg_match('/[^\n]+/',substr($output,500),$matches);``$output = substr($output, 0, 500).(array_key_exists(0,$matches) ? $matches[0] : '');`But your solution is much more satisfying. I find it interesting to think about the usage of the pipe in your regex. Thanks!
Steven Xu
+3  A: 

Personally I would avoid regex and use simple string functions:

// $str is the original string
$nl = strpos( $str, "\n", 500 ); // finds first \n starting from char 500
$sub = substr( $str, 0, $nl );
$final = str_replace( "\n", ' ', $sub );

You might need to check for \r\n as well - i.e. normalize first using str_replace( "\r\n", "\n", $str ).

DisgruntledGoat
A solution I never would have thought of after coming down with regex fever :). I'd want to change the second line to `$sub = $nl ? substr($str, 0, $nl) ? $str` to account for the possibility that the `strpos` call in the first line returns `false`.
Steven Xu
This solution works with my adjustment above. I'd be curious to see the performance difference between this approach and the one using `preg_match` and `preg_replace` elsewhere on this page.
Steven Xu
+1  A: 

use

'/(.{500,}?)(?=\n)/s'

as pattern

the /s at the end makes the dot catch newlines, {500,} means "match 500 or more" with the question mark matching as few as possible. the (?=\n) is a positive lookahead, which means the whole matched string has to be followed by a \n, but the lookahead doesn't capture anything. so it checks that the 500+ character string is followed by a newline, but doesn't include the newline in the match (or the replace, for that matter).

Though the lookahead thingy is a little fancy in this case, I guess

'/(.{500,}?)\n/s'

would do just as well. I just like lookaheads :)

Zenon