views:

1923

answers:

5

I wish to convert a single string with multiple delimiters into a key=>value hash structure. Is there a simple way to accomplish this? My current implementation is:

sub readConfigFile() {
 my %CONFIG;
 my $index = 0;
 open(CON_FILE, "config");
 my @lines = <CON_FILE>;
 close(CON_FILE);

 my @array = split(/>/, $lines[0]);
 my $total = @array;

 while($index < $total) {
     my @arr = split(/=/, $array[$index]); 
     chomp($arr[1]);
     $CONFIG{$arr[0]} = $arr[1];       
     $index = $index + 1; 
 }

 while ( ($k,$v) = each %CONFIG ) {
     print "$k => $v\n";
 }

 return;
}

where 'config' contains:

pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76

The last digits need to be also removed, and kept in a separate key=>value pair whose name can be 'ip'. (I have not been able to accomplish this without making the code too lengthy and complicated).

A: 

Here's one way.


foreach ( @lines ) {
  chomp;
  my %CONFIG;
  # Extract the last digit first and replace it with an end of
  # pair delimiter.
  s/\s*([\d\.]+)\s*$/>/;
  $CONFIG{ip} = $1;
  while ( /([^=]*)=([^>]*)>/g ) {
    $CONFIG{$1} = $2;
  }
  print Dumper ( \%CONFIG );
}
Martin Redmond
That down vote as pretty lame. The print statement has nothing to do with the solution. Down voting all the answers that are not yours?
Martin Redmond
might need to make some change in code, as the last digit is coming as a separate hash variable again$VAR1 = { 'rec' => '1', 'ip' => '.76', 'adv' => '111 22 3456', 'pub' => '1', 'size' => '2' };$VAR1 = { 'ip' => '.76' };
gagneet
No, the downvote is because the code doesn't have a chance of solving the problem. It forgets everything except for the last line. I upvoted the solutions that actually work, but yours isn't one of those.
brian d foy
The original request didn't require %CONFIG to be returned. It can be processed within the loop, it doesn't need to remember each line. For converting a string into a hash this would work just fine. :)
Martin Redmond
Answering the question people ask isn't the same thing as answering the question they have.
brian d foy
I like perl more than I like the perl community.
Martin Redmond
+3  A: 

What is your configuration data structure supposed to look like? So far the solutions only record the last line because they are stomping on the same hash keys every time they add a record.

Here's something that might get you closer, but you still need to figure out what the data structure should be.

  • I pass in the file handle as an argument so my subroutine isn't tied to a particular way of getting the data. It can be from a file, a string, a socket, or even the stuff below DATA in this case.

  • Instead of fixing things up after I parse the string, I fix the string to have the "ip" element before I parse it. Once I do that, the "ip" element isn't a special case and it's just a matter of a double split. This is a very important technique to save a lot of work and code.

  • I create a hash reference inside the subroutine and return that hash reference when I'm done. I don't need a global variable. :)

use warnings;
use strict;

use Data::Dumper;

readConfigFile( \*DATA );

sub readConfigFile
    {
    my( $fh ) = shift;

    my $hash = {};

    while( <$fh> )
     {
     chomp;

     s/\s+(\d*\.\d+)$/>ip=$1/;

     $hash->{ $. } = { map { split /=/ } split />/ };
     }

    return $hash;
    }

my $hash = readConfigFile( \*DATA );

print Dumper( $hash );

__DATA__
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76

This gives you a data structure where each line is a separate record. I choose the line number of the record ($.) as the top-level key, but you can use anything that you like.

$VAR1 = {
          '1' => {
                   'ip' => '6.00',
                   'rec' => '0',
                   'adv' => '1234 123 4.5',
                   'pub' => '3',
                   'size' => '3'
                 },
          '2' => {
                   'ip' => '.76',
                   'rec' => '1',
                   'adv' => '111 22 3456',
                   'pub' => '1',
                   'size' => '2'
                 }
        };

If that's not the structure you want, show us what you'd like to end up with and we can adjust our answers.

brian d foy
this is the structure i am looking for, but i think am a bit confused on the use of the \*DATA, could you throw some more light on this and if i wish to use a file handle, how do i do the same. i have not been able to do this for now. :-(
gagneet
The \*DATA is a typeglob reference. It's a little bit of magic to get a filehandle reference for a named filehandle. You don't have to worry about it because you probably won't be using __DATA__ in your real program. You can pass any filehandle to the subroutine.
brian d foy
+2  A: 
Chris Charley
I fixed your DATA. In the pre, you have to use the entities for the less than symbol, otherwise HTML thinks its a tag. Too bad the markdown preview is broken (or the other way around).
brian d foy
Oh, and don't forget his requirement about the last number on the line being a separate key.
brian d foy
Yes, I passed over that point :-( Thanks Brian.
Chris Charley
this works, except for the last digit requirement, as pointed by brian. thanks for the same though :-)
gagneet
A: 

The config file format is sub-optimal, shall we say. That is, there are easier formats to parse and understand. [Added: but the format is already defined by another program. Perl is flexible enough to deal with that.]

Your code slurps the file when there is no real need.

Your code only pays attention to the last line of data in the file (as Chris Charley noted while I was typing this up).

You also have not allowed for comment lines or blank lines - both are a good idea in any config file and they are easy to support. [Added: again, with the pre-defined format, this is barely relevant, but when you design your own files, do remember it.]

Here's an adaptation of your function into somewhat more idiomatic Perl.

#!/bin/perl -w
use strict;
use constant debug => 0;

sub readConfigFile()
{
    my %CONFIG;
    open(CON_FILE, "config") or die "failed to open file ($!)\n";

    while (my $line = <CON_FILE>)
    {
        chomp $line;
        $line =~ s/#.*//;           # Remove comments
        next if $line =~ /^\s*$/;   # Ignore blank lines

        foreach my $field (split(/>/, $line))
        {
            my @arr = split(/=/, $field);
            $CONFIG{$arr[0]} = $arr[1];
            print ":: $arr[0] => $arr[1]\n" if debug;
        }
    }
    close(CON_FILE);

    while (my($k,$v) = each %CONFIG)
    {
        print "$k => $v\n";
    }
    return %CONFIG;
}

readConfigFile;    # Ignores returned hash

Now, you need to explain more clearly what the structure of the last field is, and why you have an 'ip' field without the key=value notation. Consistency makes life easier for everybody. You also need to think about how multiple lines are supposed to be handled. And I'd explore using a more orthodox notation, such as:

pub=3;rec=0;size=3;adv=(1234,123,4.5);ip=6.00

Colon or semi-colon as delimiters are fairly conventional; parentheses around comma separated items in a list are not an outrageous convention. Consistency is paramount. Emerson said "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines", but consistency in Computer Science is a great benefit to everyone.

Jonathan Leffler
Emerson said "A foolish consistency". That's different than consistency that isn't foolish. :)
brian d foy
actually i am taking the input from a file which already exists for some other purpose (it is a tree structure which is to be parsed and checked), hence the odd delimiters used ...brain has given a solution for multiple lines, which looked good for me. your notation for ADV looks good. thanks...
gagneet
@Brian: yes, foolish consistencies are different from non-foolish ones; the hard part is telling the two apart.
Jonathan Leffler
@Gagneet: fair enough; I wondered if it was a pre-existing format, but there is no harm in suggesting that it isn't self-evidently a good format. Perl is flexible enough to adapt to the worst of what the rest of the world does to it.
Jonathan Leffler
A: 

The below assumes the delimiter is guaranteed to be a >, and there is no chance of that appearing in the data.

I simply split each line based on '>'. The last value will contain a key=value pair, then a space, then the IP, so split this on / / exactly once (limit 2) and you get the k=v and the IP. Save the IP to the hash and keep the k=v pair in the array, then go through the array and split k=v on '='.

Fill in the hashref and push it to your higher-scoped array. This will then contain your hashrefs when finished.

(Having loaded the config into an array)

my @hashes;

for my $line (@config) {
    my $hash; # config line will end up here

    my @pairs = split />/, $line;

    # Do the ip first. Split the last element of @pairs and put the second half into the
    # hash, overwriting the element with the first half at the same time.
    # This means we don't have to do anything special with the for loop below.
    ($pairs[-1], $hash->{ip}) = (split / /, $pairs[-1], 2);

    for (@pairs) {
        my ($k, $v) = split /=/;
        $hash->{$k} = $v;
    }

    push @hashes, $hash;
}
Altreus