tags:

views:

75

answers:

5

I need help to build a regex that can remove EVEN lines in a plain textfile.

Given this input:

line1
line2
line3
line4
line5
line6

It would output this:

line1
line3
line5

Thanks !

A: 

Well this, will remove EVEN lines from the text file:

grep '[13579]$' textfile > textfilewithoddlines

And output this:

line1

line3

line5

emil
that is not scalable.
ghostdog74
Oh yes it is. No matter how large the number, whether it's odd is decided by just the last digit (`$`) which must be one of these 5 digits.
bart
what i mean by not scalable is that the data may not be a literal "line1", "line2". It may be anything. So grepping for patterns that end with a number is not scalable.
ghostdog74
It is plenty scalable. What ghostdog means is that it's not _general_. Unfortunately the spec (question) doesn't state what kind of generality is required, so we are left to guess.
Jay Bazuzi
@ghostdog74: As OP said he wanted to use regex I assumed he wanted lines ending in odd/even numbers. Otherwise as many says, one wouldn't use regex. `sed -n '2,$n;p' textfile` might be better suited.
emil
+5  A: 

Actually, you don't use regex for that. With your favourite language, iterate the file, use a counter and do modulus. eg with awk (*nix)

$ awk 'NR%2==1' file
line1
line3
line5

even lines:

$ awk 'NR%2==0' file
line2
line4
line6
ghostdog74
Tried this and it works ! Thanks :)
sthg
A: 

Perhaps you are on the command line. In PowerShell:

$x = 0; gc .\foo.txt | ? { $x++;  $x % 2 -eq 0 }

Jay Bazuzi
+1  A: 

Well, if you do a search-and-replace-all-matches on

^(.*)\r?\n.*

in "^ matches start-of-line mode" and ". doesn't match linebreaks mode"; replacing with

\1

then you lose every even line.

E. g. in C#:

resultString = Regex.Replace(subjectString, @"^(.*)\r?\n.*", "$1", RegexOptions.Multiline);

or in Python:

result = re.sub(r"(?m)^(.*)\r?\n.*", r"\1", subject)
Tim Pietzcker
You should also cover the case that there is an odd number of lines.
Gumbo
Thought so at first, too - but we want to keep the odd lines, don't we? :) By the way, congratulations on being elected moderator - I just noticed the diamond (and I did vote for you ;)
Tim Pietzcker
+1  A: 

First, I fully agree with the consensus that this is not something regex should be doing.

Here's a Java demo:

public class Test {

    public static String voodoo(String lines) {
        return lines.replaceAll("\\G(.*\r?\n).*(?:\r?\n|$)", "$1");
    }

    public static void main(String[] args) {
        System.out.println("a)\n"+voodoo("1\n2\n3\n4\n5\n6"));
        System.out.println("b)\n"+voodoo("1\r\n2\n3\r\n4\n5\n6\n7"));
        System.out.println("c)\n"+voodoo("1"));
    }
}

output:

a)
1
3
5

b)
1
3
5
7

c)
1

A short explanation of the regex:

\G       # match the end of the previous match
(        # start capture group 1
  .*     #   match any character except line breaks and repeat it zero or more times
  \r?    #   match the character '\r' and match it once or none at all
  \n     #   match the character '\n'
)        # end capture group 1
.*       # match any character except line breaks and repeat it zero or more times
(?:      # start non-capture group 1 
  \r?    #   match the character '\r' and match it once or none at all
  \n     #   match the character '\n'
  |      #   OR
  $      #   match the end of the input
)        # end non-capture group 1

\G begins at the start of the string. Every pair of lines (where the second line is optional, in case of the last uneven line) gets replaced by the first line in the pair.

But again: using a normal programming language (if one can call awk "normal" :)) is the way to go.

EDIT

And as Tim suggested, this also works:

replaceAll("(?m)^(.*)\r?\n.*", "$1")
Bart Kiers
Shouldn't `String result = subject.replaceAll("(?m)^(.*)\r?\n.*", "$1");` work just the same? After a match, the regex engine will automatically have arrived at the start of the next odd line.
Tim Pietzcker
Yes, of course it does! As is often the case with me: I try to solve things in a much too difficult way!
Bart Kiers