tags:

views:

159

answers:

4

I have to parse a file and store it in a table. I was asked to use a hash to implement this. Give me simple means to do that, only in Perl.

-----------------------------------------------------------------------
L1234| Archana20 | 2010-02-12 17:41:01 -0700 (Mon, 19 Apr 2010) | 1 line
PD:21534 / lserve<->Progress good
------------------------------------------------------------------------
L1235 | Archana20 | 2010-04-12 12:54:41 -0700 (Fri, 16 Apr 2010) | 1 line
PD:21534 / Module<->Dir,requires completion
------------------------------------------------------------------------
L1236 | Archana20  | 2010-02-12 17:39:43 -0700 (Wed, 14 Apr 2010) | 1 line
PD:21534 / General Page problem fixed
------------------------------------------------------------------------
L1237 | Archana20  | 2010-03-13 07:29:53 -0700 (Tue, 13 Apr 2010) | 1 line
gTr:SLC-163 / immediate fix required
------------------------------------------------------------------------
L1238 | Archana20 | 2010-02-12 13:00:44 -0700 (Mon, 12 Apr 2010) | 1 line
PD:21534 / Loc Information Page
------------------------------------------------------------------------

I want to read this file and I want to perform a split or whatever to extract the following fields in a table:

  • the id that starts with L should be the first field in a table
  • Archana20 must be in the second field
  • timestamp must be in the third field
  • PD must be in the fourth field
  • Type (content preceding / must be in the last field)

My questions are:

  1. How to ignore the --------… (separator line) in this file?
  2. How to extract the above?
  3. How to split since the file has two delimiters (|, /)?
  4. How to implement it using a hash and what is the need for this?

Please provide some simple means so that I can understand since I am a beginner to Perl.

+1  A: 

When you say This is not a homework...to mean this will be a start to assess me in perl I assume you mean that this is perhaps the first assignment you have at a new job or something, in which case It seems that if we just give you the answer it will actually harm you later since they will assume you know more about Perl than you do.

However, I will point you in the right direction.

A. Don't use split, use regular expressions. You can learn about them by googling "perl regex" B. Google "perl hash" to learn about perl hashes. The first result is very good.

Now to your questions:

  1. regular expressions will help you ignore lines you don't want
  2. regular expressions with extract items. Look up "capture variables"
  3. Don't split, use regex
  4. See point B above.
tster
Add due respect, but I think regular expressions will be far harder and more error-prone here than splitting first on ' | ' and then splitting the final field again later on ' / '.
Telemachus
+2  A: 

My questions are:

  1. How to ignore the --------… (separator line) in this file?
  2. How to extract the above?
  3. How to split since the file has two delimiters (|, /)?
  4. How to implement it using a hash and what is the need for this?
  1. You will probably be working through the file line by line in a loop. Take a look at perldoc -f next. You can use regular expressions or a simpler match in this case, to make sure that you only skip appropriate lines.
  2. You need to split first and then handle each field as needed after, I would guess.
  3. Split on the primary delimiter (which appears to be ' | ' - more on that in a minute), then split the final field on its secondary delimiter afterwards.
  4. I'm not sure if you are asking whether you need a hash or not. If so, you need to pick which item will provide the best set of (unique) keys. We can't do that for you since we don't know your data, but the first field (at a glance) looks about right. As for how to get something like this into a more complex data structure, you will want to look at perldoc perldsc eventually, though it might only confuse you right now.

One other thing, your data above looks like it has a semi-important typo in the first line. In that line only, there is no space between the first field and its delimiter. Everywhere else it's ' | '. I mention this only because it can matter for split. I nearly edited this, but maybe the data itself is irregular, though I doubt it.

I don't know how much of a beginner you are to Perl, but if you are completely new to it, you should think about a book (online tutorials vary widely and many are terribly out of date). A reasonably good introductory book is freely available online: Beginning Perl. Another good option is Learning Perl and Intermediate Perl (they really go together).

Telemachus
+1  A: 

If this file is line based then you can do a line by line based read in a while loop. Then skip those lines that aren't formatted how you wish.

After that, you can either use regex as indicated in the other answer. I'd use that to split it up and get an array and build a hash of lists for the record. Either after that (or before) clean up each record by trimming whitespace etc. If you use regex, then use the capture expressions to add to your list in that fashion. Its up to you.

The hash key is the first column, the list contains everything else. If you are just doing a direct insert, you can get away with a list of lists and just put everything in that instead.

The key for the hash would allow you to look at particular records for fast lookup. But if you don't need that, then an array would be fine.

Simon
A: 

You can try this one,

Points need to know:

  1. read the file line by line
  2. By using regular expression, removing '----' lines.
  3. after that use split function to populate Hashes of array .

        #!/usr/bin/perl
        use strict;
        use warning;
        my $test_file = 'test.txt';
        open(IN, '<' ,"$test_file") or die $!;
        my (%seen, $id, $name, $timestamp, $PD, $type);
        while(<IN>){
           chomp;
           my $line = $_;
           if($line =~ m/^-/){ #removing '---' lines
            # print "$line:hello\n";
           }else{
           if ($line =~ /\|/){
              ($id , $name, $timestamp) = split /\|/, $line, 4;
           } else{
             ($PD, $type) =  split /\//, $line , 3;
           }
           $seen{$id}= [$name, $timestamp, $PD, $type]; //use Hashes of array
           }
        }
        for my $test(sort keys %seen){
            my $test1 = $seen{$test};
          print "$test:@{$test1}\n";
        }
        close(IN);
    
Nikhil Jain
Please use lexical filehandles and the three-argument form of open. That makes the ponies and rainbows happy.
Telemachus
i am trying a long time to this...Chanceless...thanks a ton for the immediate resolution
Sandhya
-0.4 for various bad style (which would round back to 0 on its own), but another -0.4 for giving a man a fish when he needs to read some books on fishing.
Ether
@Ether:I agree, i should not give a man a fish but as a beginner, it is not a easy piece of cake.
Nikhil Jain
@nikhilTruely said...
Sandhya
@Sandhya : have you tried my logic, can you tell me what exactly the problem are you facing or the problem has been solved by some other logic?
Nikhil Jain
@nikhil:I tried the same...But that : comes in between and i wanna remove PID....so i changed a bit like i used the split w/o specifyng limits...One more doubt...now i wanna add it to table... I implemented hash..But donno hw to add in the table..Googling helpd but not to a great extent..
Sandhya
@Sandhya : ok that means my logic works at some extent, because before i posted the code, i run it on my machine and it was working fine. I don't know why i am getting down votes...regarding your question, how to populate hash in a html table, you can use CPAN module template::toolkit , it can help you.Read it!!! and meanwhile send the latest code so that i can also help you for the same.
Nikhil Jain
@nikhil..HAve posted in as a new question..Please check with it
Sandhya
@Sandhya: can you tell me why are using hashes of hashes instead of Hashes of array? and second question is are you talking about html table?
Nikhil Jain