views:

172

answers:

6

I have strings of this kind

NAME1              NAME2          DEPTNAME           POSITION
JONH MILLER        ROBERT JIM     CS                 ASST GENERAL MANAGER 

I want the output to be name1 name2 and position how can i do it using split/regex/trim/etc and without using CPAN modules?

+6  A: 

It's going to depend on whether those are fixed length fields, or if they are tab separated. The easiest (using split) is if they are tab separated.

my ($name1, $name2, $deptName, $position) = split("\t", $string);

If they're fixed length, and assuming they are all, say, 10 characters long, you can parse it like

my ($name1, $name2, $deptName, $position) = unpack("A10 A10 A10 A10", $string);
Paul Tomblin
They are not of fixed length.
Sunny
@Sunny, then how are you going to determine where one field ends and the next begins, seeing as how some of the fields have spaces in them? Either you need to delimit them with a specific character like tab, or you need to put them in specific places. In the first case, you use split, in the second you use unpack.
Paul Tomblin
Thanks Paul.when I want to vote it says Vote Up requires 15 reputation.
Sunny
@Sunny, well how about accepting an answer to your first question?
Paul Tomblin
A: 

To split on whitespace:

@string_parts = split /\s{2,}/, $string;

This will split $string into a list of substrings. The separator will be the regex \s+, which means one or more whitespace characters. This includes spaces, tabs, and (unless I'm mistaken) newlines.

Edit: I see that one of the requirements is not to split on only one space, but to split on two or more. I modified the regex accordingly.

Nathan Fellman
This solution will split string like "JONH" , "MILLER" but its a single name so it should be JONH MILLER, that means solution is not correct.
Nikhil Jain
@Nikhil: Good point. But you could do something like `@string_parts = split /\s\s+|\t\s*/, $string` to split on multiple spaces, or one tab and possibly other space characters.
Platinum Azure
@Platinum: That true, exactly i am doing the same thing in my answer.
Nikhil Jain
+2  A: 

If your input data comes in as an array of strings (@strings), this

for my $s (@strings) {
   my $output = join ' ',
                map /^\s*(.+)\s*$/ ? $1 : (),
                unpack('A19 A15 x19 A*', $s);
   print "$output\n"
}

would extract and trim the information needed.

NAME1 | NAME2 | POSITION

and

JONH MILLER | ROBERT JIM | ASST GENERAL MANAGER

(The '|' were included by me for better expalnation of the result)

Regards

rbo

rubber boots
Unpack is a great tool for this, and we cover almost this same example in _Effective Perl Programming_. I'd like to have an entire pack chapter in the next book :)
brian d foy
@brian, "The Book" looks promising, I'd love to have a chapter on advanced regular expressions (sth. like a contemporary version of japhys Regex Arcana: http://japhy.perlmonk.org/articles/tpj/2004-summer.html). Furthermore, in the first edition of the old "Advanced Perl Programming" (by Srinivasan), there have been some very interesting advanced topics (Perl guts, embedding, XS-hands on, and eval) which were left out from the second edt. (by Simon Cozens). Such (more technical) advanced topics aren't part of any actual books I know of. (BTW: I ordered the 2'nd edt. of E.P.P yesterday).
rubber boots
For Perl guts, get _Extending and Embedding Perl_. Some of the interesting parts of _Advanced Perl Programming, 1st Edition_ were the basis for _Mastering Perl_. For fancy regex stuff, _Mastering Regular Expressions_. _Mastering Perl_ has some fancy regex stuff too, as does _Effective Perl Programming_. Maybe you just need to read more books. Remember, though, that all this stuff is also in the docs, so you don't need to buy a book.
brian d foy
A: 

Consider using autosplit in a Perl one-liner from your command line:

$ perl -F/\s{2,}/ -ane 'print qq/@F[0,1,3]\n/' file

The one-liner will split on two or more consecutive spaces and print the first, second and fourth fields, corresponding to NAME1, NAME2 and POSITION fields.

Of course, this will break if you have only a single space separating NAME1 and NAME2 entries, but more information is needed about your file in order to ascertain what the best course of action might be.

Zaid
Any reason for the downvote?
Zaid
+1  A: 

Assuming that space between the fields are not fixed so split string on the basis of two or more spaces so that it will not break the Name like JONH MILLER into two parts.

#!/usr/bin/perl
use strict;
use warning;
my $string = "NAME1              NAME2          DEPTNAME           POSITION
             JONH MILLER        ROBERT JIM     CS                 ASST GENERAL MANAGER ";
my @string_parts = split /\s\s+/, $string;
foreach my $test (@string_parts){  
      print"$test\n";
}
Nikhil Jain
+1  A: 

From the sample there, a single space belongs in the data, but 2 or more contiguous spaces do not. So you can easily split on 2 or more spaces. The only thing I add to this is the use of List::MoreUtils::mesh

use List::MoreUtils qw<mesh>;
my @names   = map { chomp; $_ } split /\s{2,}/, <$file>;
my @records = map { chomp; { mesh( @names, @{[ split /\s{2,}/ ]} ) } } <$file>;
Axeman