ansaurus

Question

Pull specific information from a long list with Perl

Answer 1

+3 A:

Something like this should do the trick -- it reads from stdin and outputs to stdout, so you can use normal unix pipes to use files:

#!/usr/bin/perl

use strict;
use warnings;
use String::Util 'trim';

# set "line ending" to \n\n, to allow slurping by paragraphs:
local $/ = "\n\n";

while (my $line = <>)
{
    chomp $line;

    my ($displayName) = ($line =~ /^displayName: (.+)$/m);
    my ($name) = ($line =~ /^name: ##(.+)$/m);
    trim $displayName;
    trim $name;

    my ($firstName, $lastName) = ($displayName =~ /^([^ ]+) (.+)$/);

    print "$firstName\t$lastName\t$name\n";
}

I tested this using the sample input you gave below, as test.pl < input.txt and got the output:

John    Doe     userName
Jane    Doe     userName
Ted     Doe     userName

You can read about slurping in paragraph mode in perldoc perlvar under $/, or at this SO question (link needed). Matching within multiple lines is enabled with the m flag on the match operator -- see perldoc perlre.

Ether 2010-05-13 22:01:22

Footnote: I have been unable to find the SO references to paragraph mode, although I know that `$/` has been discussed several times in the past. If anyone finds this link, please add a comment or edit it into the question - thanks!

Ether 2010-05-13 22:08:55

Here's one question dealing with paragraph-parsing: http://stackoverflow.com/questions/1809469/how-do-i-read-paragraphs-at-a-time-with-perl

FM 2010-05-13 23:51:14

@FM: that one discusses setting `local $/ = undef;` -- but I'm sure there was a question a while back that discussed setting it to `"\n\n"` in order to read paragraphs at a time...

Ether 2010-05-14 00:09:55

Ahhhh, paragraph parsing, that's what I should have been looking for. Thanks for the answers!

Melignus 2010-05-14 00:17:41

Use of uninitialized value $firstName in concatenationUse of uninitialized value $lastName in concatenationI just keep getting these errors when I try this script.

Melignus 2010-05-14 00:52:33

@melignus: oops, I introduced an error when I added the splitting into first and last names! fixed now, so sorry.

Ether 2010-05-14 00:55:07

@Ether: great that works but I noticed that some of the userNames are numbers themselves and when the script comes to those it's not pulling the value, the value just ends up blank. The leading ##'s still need to be cut off. Hehe, thanks again for helping out! Oh jeesh and then some of the accounts have a middle name which I think is breaking something. I don't need the middle name just the first and last names.

Melignus 2010-05-14 01:09:41

@melignus: the pattern to extract the userName is `/^name: ##(.+)$/`, which you can adjust to your needs -- the documentation for regular expressions is in http://perldoc.perl.org/perlre.html. Similarly for splitting the displayName into first and last names, you may need to adjust it to account for all the possible values in your data. It's pretty hard to disambiguate first, middle and last names without a lot of context -- e.g. how would you parse "John Adams Jr.", "Mary Sue Findlay", "Howard Jones III" and "Mike van der Velden"? You should probably just leave the name alone, if possible.

Ether 2010-05-14 02:19:12

@Ether: Thanks a ton for the help Ether, I think I can figure it out from here, and the middle name isn't so hard as it always comes last as in First Last Middle, so all I have to do is omit everything after then second trailing space, I should be able to figure that out.

Melignus 2010-05-14 03:05:25

Answer 2

A:

This is my solution.

use strict;
use warnings;
my $fh;
my $file_contents;
my @info;
open $fh, '<', "data" or die($!);
local $/ = undef;
$file_contents = <$fh>;

while($file_contents =~ /.ame: (.*?)$(.*?).ame: (.*?)$/smg)
{

   my $displayname = $1;
   my $username = $3;
   $displayname =~ s/^\s+//; #clean off any whitespace from front/back
   $displayname =~ s/\s+$//;
   my ($firstname, $lastname) = split(/\s+/, $displayname); #split on whitespace

   print "$firstname\t$lastname\t$username\n"; #note the tabs
}

Paul Nathan 2010-05-13 22:26:50

No such file or directory at line 8, not quite sure what's going on here.

Melignus 2010-05-14 00:57:10

@melingnis: it reads a file called data -

Paul Nathan 2010-05-14 13:51:09

ansaurus

tags:

views:

answers:

Pull specific information from a long list with Perl

related questions