views:

119

answers:

4

I am sorting through some lines and some contain email, some don't.

I need to remove all lines less than 6 characters.

I did a little surfing and found no solid answers so, I tried to write my first expression.

Please tell me if this will work. Did I get it right?

$six-or-more = preg_replace("!\b\w{1,5}\b!", "", $line-in); 

Followed by the below which I "stole" which may in fact be superfluous.

$no-empty-lines = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $six-or-more);
$lines = preg_split("/[\s]*[\n][\s]*/", $no-empty-lines);

You can see what I am trying to do but, I think it is a bit much.

Thanks for the tutorial.

+6  A: 

You can use strlen() or mb_strlen() (for multibyte string) for check line lenght.

Svisstack
Post comments on comments section under his question. There is a button `add comment`, you can click and add your comments.
BrunoLM
Only thing to note is that if the string is multibyte, you have to use mb_strlen() in conjunction with mb_internal_encoding().
bisko
@bisko: updated, thanks
Svisstack
@BrunoLM: post text changed to non comment style, thanks.
Svisstack
@Svisstack: I've removed the downvote. It would be nice if you provide some example on how he should iterate through the lines and remove the ones that matches his conditions. He was asking for a regex, but some alternative way to do the same thing would be interesting to know. Maybe the performance can be increased using this method.
BrunoLM
A: 

Why not explode the data at a new line, check if individual line length is less than 6. If its less, expunge the line, if not, proceed.

Russell Dias
+2  A: 

lets say that your lines are in array:

$lines= array('less', 'name', 'some long name', '[email protected]');

and you want that all longer than 6 characters are printed out...

<?php
$lines= array('less', 'name', 'some long name', '[email protected]');

foreach ($lines as $line) {
    if(strlen($line) < 6) { //this chect if string length is higher of 5
        continue; //if not skip
    }
    else {
        echo $line . '<br />'; //print line or do what you want :)
    }
}

?>

The above example will output:

some long name
[email protected]
Wolfy
Thank you, this is useful to me.
Stephayne
+2  A: 

\b matches a "word boundary" -- that is, the start or end of a word. It'll trigger on spaces and punctuation between words as well, so you'll effectively remove every word between 1 and 5 chars, rather than every line as intended. (BTW, if you have backslashes in strings, you should either be escaping them or using single-quotes instead to avoid future gotchas.)

You could try

$six_or_more = preg_replace('/^.{0,5}$[\r\n]*/m', '', $line_in);

With the /m modifier, ^ and $ match the start and end of each line, respectively, rather than the start and end of the whole string. It matches right before the newline, though, so the line would more than likely become blank rather than getting removed unless you match the newline "after the end" as well.

cHao
Thank you for pointing out that I would be removing every word in each line between 1 and 5. This file is a single column of these words and I did not clarify that but, your example taught me how to do it both ways so THANK YOU! This was VERY USEFUL and I have learned today!
Stephayne
@Stephayne: If the answer is what you were looking for mark as accepted (under the vote count on the left).
BrunoLM