views:

240

answers:

4

I'm trying to write a Perl script that will parse the output of the stcmd.exe (the StarTeam command line client) hist command. I'm getting the history for every file in a view and the output looks something like this:

Folder: The View Name  (working dir: C:\Projects\dir)
History for: main.h
Description: Some files
Locked by:
Status: Current
----------------------------
Revision: 1 View: The View Name Branch Revision: 1.0
Author: John Smith Date: 3/22/08 11:16:16 AM CST
Main header
=============================================================================

History for: main.c
Description: Some files
Locked by:
Status: Current
----------------------------
Revision: 2 View: The View Name Branch Revision: 1.1
Author: Jane Doe Date: 3/22/08 1:55:55 PM CST
Made an update.

----------------------------
Revision: 1 View: The View Name Branch Revision: 1.0
Author: John Smith Date: 3/22/08 11:16:16 AM CST
Initial revision
=============================================================================

Note that the revision summary can contain newlines and can be blank (in which case there's no line for it at all).

I want to get the filename and, for each revision, the author name (first and last), date, and change summary. I'd like to place this information in a data structure where I can sort revisions by date and combine revisions if the date, author, and summary match up. (I think I can figure this part out if someone can help me with the parsing.) I'm not great with regular expressions or Perl, but here's what I'm trying to work with right now:

# $hist contains the stcmd output in the format above
while($hist =~ /History for: (?<filename>.)/s)
{
    # Record filename somewhere with $+{filename}

    while($hist =~ /^Revision: (?<file_rev>\S+) View: (?<view_name>.+) Branch Revision: (?<branch_rev>\S+).\nAuthor: (?<author>.*) Date: (?<date>.*) \w+\r\n(?<summary>.*)/)
    {
        # Extract things with $+{author}, $+{date}, $+{summary}
    }
}

This doesn't work, however. For all I know I may be approaching it completely wrong. Can someone point me in the right direction?

+1  A: 

Here is one way to start. I prefer to split up your string into lines (\n) and loop through those:

use strict;
use warnings;

my $hist = <<'EOF';
Folder: The View Name  (working dir: C:\Projects\dir)
History for: main.h
Description: Some files
Locked by:
Status: Current
----------------------------
Revision: 1 View: The View Name Branch Revision: 1.0
Author: John Smith Date: 3/22/08 11:16:16 AM CST
Main header
=============================================================================

History for: main.c
Description: Some files
Locked by:
Status: Current
----------------------------
Revision: 2 View: The View Name Branch Revision: 1.1
Author: Jane Doe Date: 3/22/08 1:55:55 PM CST
Made an update.

----------------------------
Revision: 1 View: The View Name Branch Revision: 1.0
Author: John Smith Date: 3/22/08 11:16:16 AM CST
Initial revision
=============================================================================
EOF

my %data;
my $filename;
for (split /\n/, $hist) {
    if (/History for: (.*)/) {
        $filename = $1;
    }
    if (/^Revision: (.+?) View: (.+?) Branch Revision: (.*)/) {
        $data{$filename}{rev}    = $1;
        $data{$filename}{view}   = $2;
        $data{$filename}{branch} = $3;
    }
}

use Data::Dumper; print Dumper(\%data);

__END__

$VAR1 = {
          'main.h' => {
                        'view' => 'The View Name',
                        'rev' => '1',
                        'branch' => '1.0'
                      },
          'main.c' => {
                        'view' => 'The View Name',
                        'rev' => '1',
                        'branch' => '1.0'
                      }
        };
toolic
+4  A: 

The key is to parse one chunk at a time and match all the relevant stuff at once. See qr in perldoc perlop and $/ in perldoc perlvar.

Keeping in mind the fact that you also wanted to put the information in a data structure that would allow you to query and manipulate the information, here is one final revision. The code below uses the ability of SQLite to create in-memory databases. You might actually want to split the functionality into two scripts: One to parse and store the data and another one to do whatever manipulation you need. In fact, it might be possible to do all necessary manipulation in SQL.

#!/usr/bin/perl
use v5.010;
use strict; use warnings;
use DBI;

my $dbh = get_dbh();

my $header_pattern = qr{
    History[ ]for:     [ ](?<filename>[^\n]+)         \n
    Description:       [ ](?<description>[^\n]+)      \n
    Locked[ ]by:       [ ]?(?<lockedby>[^\n]*)        \n
    Status:            [ ](?<status>.[^\n]+)          \n
}x;

my $revision_pattern = qr{-+\n
    Revision:          [ ](?<revision>\d+)           [ ]
    View:              [ ](?<view>.+)                [ ]
    Branch[ ]Revision: [ ](?<branch_revision>[^\n]+) \n
    Author:            [ ](?<author>.+)              [ ]
    Date:              [ ](?<revdate>[^\n]+)         \n
    (?<summary>.*)                                   \n
}x;

local $/ = '=' x 77 . "\n";

while ( my $entry = <>) {
    if ( $entry =~ $header_pattern ) {
        my %file = %+;
        $dbh->do(sprintf(
                q{INSERT INTO files (%s) VALUES (%s)},
                join(',', keys %file), 
                join(',', ('?') x keys %file),
            ), {}, values %file );

        while ( $entry =~ /$revision_pattern/g ) {
            my %rev = %+;
            $dbh->do(sprintf(
                    q{INSERT INTO revisions (%s) VALUES (%s)},
                    join(',', filename => keys %rev),
                    join(',', ('?') x (1 + keys %rev)),
                ), {}, $file{filename}, values %rev );
        }
    }
}

my $revs = $dbh->selectall_arrayref(
    q{SELECT * FROM revisions JOIN files
    ON files.filename = revisions.filename},
    { Slice => {} }
);

use Data::Dumper;
print Dumper $revs;

sub get_dbh {
    my $dbh = DBI->connect(
        'dbi:SQLite:dbname=:memory:', undef, undef,
        { RaiseError => 1, AutoCommit => 1 }
    );

    $dbh->do(q{PRAGMA foreign_keys = ON});
    $dbh->do(q{CREATE TABLE files (
            filename    VARCHAR PRIMARY KEY,
            description VARCHAR,
            lockedby    VARCHAR,
            status      VARCHAR
    )});
    $dbh->do(q{CREATE TABLE revisions (
            filename        VARCHAR,
            revision        VARCHAR,
            view            VARCHAR,
            branch_revision VARCHAR,
            author          VARCHAR,
            revdate         VARCHAR,
            summary         VARCHAR,
            CONSTRAINT pk_revisions PRIMARY KEY (filename, revision),
            CONSTRAINT fk_revisions_files FOREIGN KEY (filename)
            REFERENCES files(filename)
    )});

    return $dbh;
}

Output:

C:\Temp> y.pl test.txt
$VAR1 = [
          {
            'status' => 'Current',
            'revdate' => '3/22/08 11:16:16 AM CST',
            'author' => 'John Smith',
            'description' => 'Some files',
            'revision' => '1',
            'filename' => 'main.h',
            'summary' => 'Main header',
            'view' => 'The View Name',
            'branch_revision' => '1.0',
            'lockedby' => ''
          },
          {
            'status' => 'Current',
            'revdate' => '3/22/08 1:55:55 PM CST',
            'author' => 'Jane Doe',
            'description' => 'Some files',
            'revision' => '2',
            'filename' => 'main.c',
            'summary' => 'Made an update.',
            'view' => 'The View Name',
            'branch_revision' => '1.1',
            'lockedby' => ''
          },
          {
            'status' => 'Current',
            'revdate' => '3/22/08 11:16:16 AM CST',
            'author' => 'John Smith',
            'description' => 'Some files',
            'revision' => '1',
            'filename' => 'main.c',
            'summary' => 'Initial revision',
            'view' => 'The View Name',
            'branch_revision' => '1.0',
            'lockedby' => ''
          }
        ];
Sinan Ünür
I like this answer -- very slick. But a file needs to handle 1 or more revisions (see main.c in the OP's sample data). Perhaps there is a way to retain your general approach, but use one regex to grab the file attributes (description, locked by, status) and then use another regex in a scalar `m//g` context to build up the list of revisions for that file... Just a thought.
FM
@FM fixed code to deal with multiple revisions.
Sinan Ünür
Very clean solution indeed, üstad Sinan – I was rolling out mine, when yours was submitted. Shouldn't the script begin with a: use v5.010, as the named captures are not available in older versions of Perl.
i-blis
Thanks, this looks like a great approach. However, I'm having some trouble getting it to work. If I run perl y.pl sample.txt, where sample.txt is the sample data I gave, only the first file (main.h) is output and the revisions aren't output. I'm using perl 5.10.1 for Cygwin.
Evan Shaw
@Chickencha Cygwin `perl` expects line endings to be just `LF` whereas `stcmd` presumably generates `CR+LF`. Either use `dos2unix` on the input file before processing or replace line endings with `\r?\n` in the patterns.
Sinan Ünür
And, of course, I could not resist the temptation to edit one more time. There ya'go! CW all the way.
Sinan Ünür
Yep, that did it. I didn't think Perl would care, but I guess I was wrong. Thanks a lot!
Evan Shaw
That's a beautiful regex if I've ever seen one. I normally do something like `text [ ] name` when doing `/x` regexes and I need spaces, but I think I'm changing to that method from now on. :)
Robert P
Wow -- beautifully done! Sneaky (er, clever ;) to use the line of ='s as a record separator.
Glenn
A: 

You need a state-based parser. With the __DATA__ section as before:

use v5.010;
use constant 
    { READING_FOR_FILENAME => 0
    , READING_FOR_AUTHOR   => 1
    , READING_FOR_DIVIDER  => 2
    };

use strict;
use warnings;
use English qw<%LAST_PAREN_MATCH>;
use Data::Dumper;

my $state = READING_FOR_FILENAME;
my %history_for;
my $file_name;
while ( <DATA> ) { 
    my $line = $_;
    given ( $state ) { 
        when ( READING_FOR_FILENAME ) { 
            if ( $line =~ m/^History for: (?<file_name>\S+)/ ) { 
                $file_name = $LAST_PAREN_MATCH{file_name};
                $state     = READING_FOR_DIVIDER;
            }
        }
        when ( READING_FOR_DIVIDER ) { 
            if ( $line =~ m/^-+\s*$/ ) { 
                $state = READING_FOR_AUTHOR;
            }
            elsif ( $line =~ m/^=+\s*$/ ) { 
                $state = READING_FOR_FILENAME;
            }
        }
        when ( READING_FOR_AUTHOR ) { 
            if ( $line =~ m/^Author: (?<author>[^:]+?) Date: (?<time>.*)/ ) { 
                push @{ $history_for{$file_name} }
                   , { name => $LAST_PAREN_MATCH{author}
                     , time => $LAST_PAREN_MATCH{time}
                     };
                $state = READING_FOR_DIVIDER;
            }
        }
    }
}
print Dumper( \%history_for );
Axeman
+1  A: 

You have some good answers already. Here's a different way to divide up the job:

use strict;
use warnings;
use Data::Dumper qw(Dumper);

# Read file a section at a time.
$/ = '=' x 77 . "\n";

my @data;
while (my $section = <>){
    # Split each section into sub-sections, the
    # first containing the file info and the rest
    # containing info about each revision.
    my @revs = split /-{20,}\n/, $section;

    # Do whatever you want with @file_info and, below, @ref_info.
    # The example here splits them apart into lines.
    # Alternatively, you could run the sub-sections through
    # regex parsing, as in Sinan's answer.
    my @file_info = parse_lines(shift @revs);
    push @data, { file_info => \@file_info };

    for my $r (@revs){
        my @rev_info = parse_lines($r);
        push @{$data[-1]{revs}}, \@rev_info;
    }
}

sub parse_lines {
    # Parse each sub-section into lines.
    my @lines = split /\n/, shift;
    # Optionally, filtering out unwanted material.
    @lines = grep { /\S/ and $_ !~ /={70,}/ } @lines;
    # And perhaps splitting lines into their key-value components.
    @lines = map { [split /:\s*/, $_, 2] } @lines;
    return @lines;
}

print Dumper(\@data);
FM
You get stuff like `['Revision', '1 View: The View Name Branch Revision: 1.0']`. There are actually three fields in that line.
Sinan Ünür
Yeah, that's left as an exercise for the reader. To clarify, the idea behind my example is not to parse everything the OP needs, but simply to emphasize the strategy of leveraging sub-sections in a document. Sometimes this sort of approach is useful if the parsing job isn't amenable to the ass-kicking regex strategy. :)
FM