tags:

views:

49

answers:

2

The file that I've got to work with here is the result of an LDAP extraction but I need to ultimately get the information formatted over to something that a spreadsheet can use.

So, the data is as follows:

DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData
DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData
displayName: John Doe
name: ##userName

DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData
DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData
displayName: Jane Doe Jr
name: ##userName

DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData
DataDataDataDataDataDataDataDataDataDataDataDataDataDataDataData
displayName: Ted Doe
name: ##userName

The format that I need to export to is:

firstName lastName userName
firstName lastName userName
firstName lastName userName

Where the spaces are tabs so I can then impor that file into a database. I have experience doing this in VBScript but I'm trying to switch over to using Perl for as much server administration as possible.

I'm not sure on the syntax for what I want which is basically

while not endoffile{
detect "displayName: " & $firstName & " " & $lastName
detect "name: ##" & $userName

write $firstName tab $lastName tab $userName to file
}

Also if someone could point me to a resource specifically on the text parsing syntax that Perl uses, I'd be very grateful. Most of the resources that I've come across haven't been very helpful.

Also, some of the userNames are numbers. The leading two numbers still need to be trimmed but the userName is always 6 characters long if that helps.

+3  A: 

Something like this should do the trick -- it reads from stdin and outputs to stdout, so you can use normal unix pipes to use files:

#!/usr/bin/perl

use strict;
use warnings;
use String::Util 'trim';

# set "line ending" to \n\n, to allow slurping by paragraphs:
local $/ = "\n\n";

while (my $line = <>)
{
    chomp $line;

    my ($displayName) = ($line =~ /^displayName: (.+)$/m);
    my ($name) = ($line =~ /^name: ##(.+)$/m);
    trim $displayName;
    trim $name;

    my ($firstName, $lastName) = ($displayName =~ /^([^ ]+) (.+)$/);

    print "$firstName\t$lastName\t$name\n";
}

I tested this using the sample input you gave below, as test.pl < input.txt and got the output:

John    Doe     userName
Jane    Doe     userName
Ted     Doe     userName

You can read about slurping in paragraph mode in perldoc perlvar under $/, or at this SO question (link needed). Matching within multiple lines is enabled with the m flag on the match operator -- see perldoc perlre.

Ether
Footnote: I have been unable to find the SO references to paragraph mode, although I know that `$/` has been discussed several times in the past. If anyone finds this link, please add a comment or edit it into the question - thanks!
Ether
Here's one question dealing with paragraph-parsing: http://stackoverflow.com/questions/1809469/how-do-i-read-paragraphs-at-a-time-with-perl
FM
@FM: that one discusses setting `local $/ = undef;` -- but I'm sure there was a question a while back that discussed setting it to `"\n\n"` in order to read paragraphs at a time...
Ether
Ahhhh, paragraph parsing, that's what I should have been looking for. Thanks for the answers!
Melignus
Use of uninitialized value $firstName in concatenationUse of uninitialized value $lastName in concatenationI just keep getting these errors when I try this script.
Melignus
@melignus: oops, I introduced an error when I added the splitting into first and last names! fixed now, so sorry.
Ether
@Ether: great that works but I noticed that some of the userNames are numbers themselves and when the script comes to those it's not pulling the value, the value just ends up blank. The leading ##'s still need to be cut off. Hehe, thanks again for helping out! Oh jeesh and then some of the accounts have a middle name which I think is breaking something. I don't need the middle name just the first and last names.
Melignus
@melignus: the pattern to extract the userName is `/^name: ##(.+)$/`, which you can adjust to your needs -- the documentation for regular expressions is in http://perldoc.perl.org/perlre.html. Similarly for splitting the displayName into first and last names, you may need to adjust it to account for all the possible values in your data. It's pretty hard to disambiguate first, middle and last names without a lot of context -- e.g. how would you parse "John Adams Jr.", "Mary Sue Findlay", "Howard Jones III" and "Mike van der Velden"? You should probably just leave the name alone, if possible.
Ether
@Ether: Thanks a ton for the help Ether, I think I can figure it out from here, and the middle name isn't so hard as it always comes last as in First Last Middle, so all I have to do is omit everything after then second trailing space, I should be able to figure that out.
Melignus
A: 

This is my solution.

use strict;
use warnings;
my $fh;
my $file_contents;
my @info;
open $fh, '<', "data" or die($!);
local $/ = undef;
$file_contents = <$fh>;

while($file_contents =~ /.ame: (.*?)$(.*?).ame: (.*?)$/smg)
{

   my $displayname = $1;
   my $username = $3;
   $displayname =~ s/^\s+//; #clean off any whitespace from front/back
   $displayname =~ s/\s+$//;
   my ($firstname, $lastname) = split(/\s+/, $displayname); #split on whitespace

   print "$firstname\t$lastname\t$username\n"; #note the tabs
}
Paul Nathan
No such file or directory at line 8, not quite sure what's going on here.
Melignus
@melingnis: it reads a file called data -
Paul Nathan