tags:

views:

325

answers:

5

I am looking for something like trim() but for within the bounds of a string. Users sometimes put 2, 3, 4, or more line returns after they type, I need to sanitize this input.

Sample input

i like cats


my cat is happy
i love my cat



hope you have a nice day

Desired output

i like cats

my cat is happy
i love my cat

hope you have a nice day

I am not seeing anything built in, and a string replace would take many iterations of it to do the work. Before I whip up a small recursive string replace, I wanted to see what other suggestions you all had.

I have an odd feeling there is a regex for this one as well.

+3  A: 

How much text do you need to do this on? If it is less than about 100k then you could probably just use a simple search and replace regex (searching something like /\n+/ and replace with \n)

On the other hand if you need to go through megabytes of data, then you could parse the text character by character, copying the input to the output, except when mulitple newlines are encountered, in which case you would just copy one newline and ignore the rest.

I would not recommend a recursive string replace though, sounds like that would be very very slow.

Nathan Reed
Not much, an emails worth for a user who sends in a email, it is part of a web system.
+7  A: 
function str_squeeze($body) {
    return preg_replace("/\n\n+/", "\n\n", $body);
}
tharkun
This returns all lines separated by one \n, after I change ' to " in the args.
A (slightly) more streamlined regex would look like this: preg_replace("/\n{2,}/", "\n\n", $body);
KOGI
thanks KOGI. streamlined, well, it's slighly more code. is it faster?
tharkun
It might be a good idea to add checking for \w - perhaps \w? to account for when there is whitespace interspersed. People who add random newlines might also add random spaces and tabs.
artlung
+1  A: 

The following regular expression should remove multiple linebreaks while ignoring single line breaks, which are okay by your definition:

ereg_replace("\n\n+", "\n\n", $string);

You can test it with this PHP Regular Expression test tool, which is very handy (but as it seems not in perfect parity with PHP).

[EDIT] Fixed the ' to ", as they didn't seem to work. Have to admit I just tested the regex in the web tool. ;)

Michael Barth
I got no results until I changed the ' to a " in the expression and replace. Then it works, but kills \n\n
That regex took is not in perfect parity with php. I will keep searching for a solution.
+2  A: 

Finally managed to get it, needs preg so you are using the PCRE version in php, and also needs a \n\n replacement string, in order to not wipe all line endings but one:

  $body = preg_replace("/\n\n+/", "\n\n", $body);

Thanks for getting me on the right track.

Perhaps you should select the accepted answer (tharkun's) so that this question is marked as answered (and taken out of the unanswered queue).
Calvin
+1  A: 

To consider all three line break sequences:

preg_replace('/(?:\r\n|[\r\n]){2,}/', "\n\n", $str)
Gumbo
Thanks, I do run a line ending unifier before I run preg_replace("/\n\n+/", "\n\n", $body);