tags:

views:

199

answers:

2

Disclaimer: I'm a newbie at scripting in perl, this is partially a learning exercise (but still a project for work). Also, I have a much stronger grasp on shell scripting, so my examples will likely be formatted in that mindset (but I would like to create them in perl). Sorry in advance for my verbosity, I want to make sure I am at least marginally clear in getting my point across

I have a text file (a reference guide) that is a Word document converted to text then swapped from Windows to UNIX format in Notepad++. The file is uniform in that each section of the file had the same fields/formatting/tables.

What I have planned to do, in a basic way is grab each section, keyed by unique batch job names and place all of the values into a database (or maybe just an excel file) so all the fields can be searched/edited for each job much easier than in the word file and possibly create a web interface later on.

So what I want to do is grab each section by doing something like:
sed -n '/job_name_1_regex/,/job_name_2_regex/' file.txt --how would this be formatted within a perl script?
(grab the section in total, then break it down further from there)

To read the file in the script I have open FORMAT_FILE, 'test_format.txt'; and then use foreach $line (<FORMAT_FILE>) to parse the file line by line. --is there a better way?

My next problem is that since I converted from a word doc with tables, which looks like:

 Table Heading 1      Table Heading 2
Heading 1/Value 1    Heading 2/Value 1
Heading 1/Value 2    Heading 2/Value 2

but the text file it looks like:

Table Heading 1 
Table Heading 2
Heading 1/Value 1
Heading 1/Value 2
Heading 2/Value 1
Heading 2/Value 2

So I want to have "Heading 1" and "Heading 2" as a columns name and then put the respective values there. I just am not sure how to get the values in relation to the heading from the text file. The values of Heading 1 will always be the line number of Heading 1 plus 2 (Heading 1, Heading 2, Values for heading 1). I know this can be done in awk/sed pretty easily, just not sure how to address it inside a perl script.

---EDIT---
For this I was thinking of doing an array something like:
my @heading1 = ($value1, $value2, etc.)
my @heading2 = ($value1, $value2, etc.)

I just need to be able to associate the correct values and headings together. So that heading1 = the line after heading2 (where the values start). Like saying (in shell):
x=$(cat file.txt | grep -n "Heading 1" | cut -d":" -f1) --gets the line that "Heading 1" is on in the file
(( x = x+2 )) --adds 2 to the line (where the values will start)
sed -n "$x,$last_line_of_values p" file.txt --prints values from file.txt from the line where they start to the last one (I'll figure that out at some point before this)

This is super-hacked together for the moment, to try to elaborate what I want to do...let me know if it clears it up a little...
---/EDIT---

After I have all the right values and such, linking it up to a database may be an issue as well, I haven't started looking at the way perl interacts with DBs yet.

Sorry if this is a bit scatterbrained...it's still not fully formed in my head.

+2  A: 

Several things in this post... First, the basic "best practices" :

  1. use modern perl. start your scripts with

    use strict; use warnings;

  2. don't use global filehandles, use lexical filehandles (declare them in a variable).

  3. always check "open" for return values.

    open my $file, "/some/file" or die "can't open file : $!"

Then, about pattern matching : I don't understand your example at all but I suppose you want something like :

foreach my $line ( <$file> ) {
    if ( $line =~ /regexp1/) { 
    # do something...
    }

}

Edit : about table, I suppose the best thing is to build two arrays, one for each column. If I understand correctly when reading the file you need to split the line and put one part in the @col1 array, and the second part in the @col2 array. The clear and easy way is to use two temporary variables :

my ( $val1, $val2 ) = split /\s+/, $line;
push @col1, $val1;
push @col2, $val2;
wazoox
Thanks waz, I updated the piece about the tables trying to better explain it.
Sean
+3  A: 

http://perlmeme.org/tutorials/connect_to_db.html

#!/usr/bin/perl
use strict;
use warnings;
use DBI;

my $driver = "mysql";   # Database driver type
my $database = "test";  # Database name
my $user = "";          # Database user name
my $password = "";      # Database user password

my $dbh = DBI->connect(
    "DBI:$driver:$database",
    $user, $password,
    {
        RaiseError => 1,
        PrintError => 1,
    }
) or die $DBI::errstr;

my $sth = $dbh->prepare("
        INSERT INTO test 
                    (col1, col2)
             VALUES (?, ?)
    ") or die $dbh->errstr;

my $intable = 0;
open my $file, "file.txt" or die "can't open file $!";
while (<$file>)  {
  if (/job_name_1_regex/../job_name_2_regex/) { # job 1 section
    $intable = 1 if /Table Heading 1/; # table start
    if ($intable) {
      my $next_line = <$file>; # heading 2 line
      chomp; chomp $next_line;
      $sth->execute($_, $next_line) or die $dbh->errstr;
    }
  }
}
close $file or die "can't close file $!";
$dbh->disconnect;
J.F. Sebastian
Awesome, that DB connection process much clearer...can you just explain what the line 'chomp; chomp $next_line;' does exactly, just trying to get a good handle on everything and why certain things are done.
Sean
@Sean: `chomp` removes trailing `$/` (newline as a rule) from the string (if none given then it works on `$_` variable).
J.F. Sebastian