ansaurus

Question

In Perl, how can I read parts of lines that match a criterion?

Answer 1

A:

Good question! Looks very similar to this one (linking so original answer can get more votes):

Reading sections from a file in Perl

DreadPirateShawn 2009-07-11 20:12:33

They look similar because they were asked by the same person, who presumably hasn't bothered to learn anything

friedo 2009-07-11 22:47:37

Answer 2

A:

OK, based on your later comment, this is a little different than the previous question. Also, I now realize that node #54 is a valid entry in the first column.

Update: I now also realize you do not need the first column.

Update: In general, you neither want to nor need to deal with character arrays in Perl.

Update: Now that you clarified the what should and should not be skipped, here is a version that deals with that. Add patterns to taste in the if condition.

#!/usr/bin/perl

use strict;
use warnings;

my @data;

while ( <DATA> ) {
    chomp;

    if ( /^[0-9]+-ENS.{5} +(.+)$/
            or /^node #[0-9]+ +(.+)$/
    ) {
        push @data, [ split //, $1 ];
    }
}

use Data::Dumper;
print Dumper \@data;

__DATA__
603       Some garbage data not related to me, 55, 113 ->

1-ENST0000        This is sample data blh blah blah blahhhh
2-ENSBTAP0        This is also some other sample data
21-ENADT)$        DO NOT WANT TO READ THIS LINE. 
3-ENSGALP0        This is third sample data
node #4           This is 4th sample data
node #5           This is 5th sample data

This is also part of the input file but i dont wish to read this. 
Branch -> 05 13, 
      44, 1,1,4,1

17, 1150

637                   YYYYYY: 2 : %

As for learning how to fish, I recommend you read everything related in perldoc perltoc.

Sinan Ünür 2009-07-11 20:13:16

Also in this if I again want each character to be store din different elemtnt of array I should change @row = split ' ', $_, 2; to @row = split \\, $_, 2; ?

2009-07-11 20:25:00

no no !...data does begin at a fixed column but there are other sections in the file with the same column width which i do not wish to read. So I'll take the regex from your previous edited version.

2009-07-11 20:29:20

Here is your comment from above: "yeah. fourth and fifth lines do have the heading of node #4 and node #5. After the heading there are spaces, Yes. So contents for all heading start at the same location and are aligned.... – Aaron 15 mins ago"

Sinan Ünür 2009-07-11 20:30:56

:( I'm sorry....

2009-07-11 20:32:12

I've updated the question to bring more clarity

2009-07-11 20:34:07

Nope, you did not bring clarity, you added one more twist. Maybe you could put some more work into formulating your question the next time. So, what really is the criterion for skipping. The sample case you give above does not a *specification* make I am afraid.

Sinan Ünür 2009-07-11 20:44:11

ok thanks! But is it possible for you to explain the regex with a simple 1 line comment. There is other so much crap in the file which i dont want to read so maybe I can modify your regex to fix that. I think All I want to read is integer-ENS[anyfivecharacters] followed by 9 spaces OR node #integer followed by 9 spaces

2009-07-11 20:51:08

please please explain your code in while loop. I'm not a perlmonk :(

2009-07-11 20:53:26

@Aaron All you need is an intro book to understand what is going on in this code. Of course, reading `perldoc perlretut` would also help. For quick reference, see `perldoc perlreref`.

Sinan Ünür 2009-07-11 21:05:09

Answer 3

+1 A:

Is this really a fixed-column file? If so, then don't bother with regexps. Just split at the column width, perhaps trimming trailing white space from columen 1.

djna 2009-07-11 20:21:25

+1 for pointing that out ... although it is hard to be sure that is the case based on the wording of the question.

Sinan Ünür 2009-07-11 20:24:30

Edited the question to reflect this.

2009-07-11 20:35:13

ansaurus

tags:

views:

answers:

In Perl, how can I read parts of lines that match a criterion?

related questions