tags:

views:

63

answers:

5

i am confused now... Here is my problem: I have a text file in this format

Tom                 //name
Washington 
account.txt             //filename
Gary                    //NAME
New York
accountbalance.png      //filename
Mary                    //name
New Jersey             
Michelle               //NAME
Larry                  //NAME
Charles                //NAME  
Washington
Real.cpp               //FILENAME
.
.goes on(large file)

I wanted to extract the name and corresponding filename.For example Charles is the name of the person who worked on real.cpp....

I think I need to

  1. use a while loop
  2. used two if statements within it (one to extract name other to extract filenmae)
  3. end the while loop

Problem faced:I get name and filenames which are not corresponding to it...(due to no unformity of one to one relation in the text file reading) I want the name to be the key and filename to be the value and store this in the hash How to resolve this.....I am confused..Give me suggestions,Pls

A: 

Have 3 variables Line_1,Line_2,Current_line. For first 2 lines read the variables Line_1,Line_2 are initialized. Now when reading 3rd line check whether its a File If yes then store the same in hash hash{filename} = name,city. If not the copy Line_2 to Line_1 and Current_line to Line_2. This shuld happen in a loop till whole file is read

Raghuram
@raghuram..I cant have three variable since there are several other lines including some other info with different order..that is my problem actually
Sreeja
+1  A: 

If names always have //name following them and the filenames always have //filename following them and the name before the filename is the name to associate with the filename, it is fairly simple:

#!/usr/bin/perl

use strict;
use warnings;

my $key;
my %name_to_filename;
while (<DATA>) {
    #only pay attention to lines that have //name or //filename
    #and save off the part before //name or //filename and which type it was
    next unless my ($name, $type) = m{(.*?)\s+//(name|filename)}i;
    if ($type =~ /^name$/i) {
        $key = $name; #remember the last name seen
        next;
    }
    $name_to_filename{$key} = $name;
}

use Data::Dumper;
print Dumper \%name_to_filename;

__DATA__
Tom                 //name
Washington
account.txt             //filename
Gary                    //NAME
New York
accountbalance.png      //filename
Mary                    //name
New Jersey
Michelle               //NAME
Larry                  //NAME
Charles                //NAME
Washington
Real.cpp               //FILENAME
Chas. Owens
@chas.OwenNo //name or //filename is a comment that i gave ..Its not in the file
Sreeja
It looks like the lines contain names, cities, and filenames. You can get an 80% solution by creating a hash of city names and ignoring them, but that will exclude valid names. For instance, the name Washington is a valid first name, last name, and city name. All of the lines are valid filenames, so you are going to have a problem there.
Chas. Owens
A: 

Since you want to map names to a file name. The data shows that you get a list of names and then a file name. So you're going to need to store up keys until you know what you can store them with.

Additionally, since you didn't say anything about state names, I expect you want to ignore those. So we need a way to tell them apart. Fortunately, the states are a well-defined set, and can be put into a lookup table.

Then, we need a way to distinguish names from filenames, from what you show, I'm going with the following pattern: at least one word character, then a single dot, then at least one word character for the extention.

So that will tell me whether we're on a file line, and can resolve the value of the pending names.

@ARGV = '/path/to/file';

my %state_hash
    = ( Alabama => 1, Alaska => 1, Arizona => 1, ...
      , 'New Hampshire' => 1, ..., Wyoming => 1
      );

my ( @pending_names, %file_for );
while ( <> ) { 
    # Extract non-spaces at the beginning of the line
    # potentially separated with one-and-only-one space
    my ( $name_or_file ) = m/^(?:\S+[ ]?)+)/;
    next unless $name_or_file or exists $state_hash{ $name_or_file };

    # if the extract value fits the file pattern
    if ( $name_or_file =~ m/^\w+\.\w+$/ ) { 
        # store the name-file combination for each pending
        $file_for{ $_ } = $name_or_file foreach @pending_names;
        # they are not pending anymore, so clear them.
        @pending_names  = ();
    }
    else { 
        # store up pending names
        push @pending_names, $name_or_file;
    }
}

What you didn't ask to handle is whether or not, it being a "large file", a name is likely to repeat. If a name repeats more than once, you'll clobber the value you save last time.

This can be remedied by push-ing onto the hash slot and not simply assigning it. Like so:

push @{ $file_for{ $_ } }, $name_or_file foreach @pending_name;
Axeman
my %state_hash = ( Alabama => 1, Alaska => 1, Arizona => 1, ... , 'New Hampshire' => 1, ..., Wyoming => 1 ); I cant hard code like this....Its very large...And morover Its not like it has any order of name city and filename(No order maintained
Sreeja
@Sreeja, consider it less an invitation to hard code and more of a demonstration of how you would do it. Otherwise, you'd assign to the names of localities the name of the file. I'm quite sure that the localities are likely to repeat (however, you might not care if their value gets clobbered). At a pure abstraction level, there is no mention of a database, so despite that it is proper to have these kinds of lists in DB, whether or not you have them stored on a DB is an external matter. If you expect the names of the authors of files in a hash, it is questionable to put localities there.
Axeman
A: 

This version uses a hash named %is_city to skip lines that look like cities and assumes that a name containing a . is a filename. Both of these assumptions are bad though. For instance, my name contains a period and names like Madison can be the name of a city or a person.

#!/usr/bin/perl

use strict;
use warnings;

my %is_city = map { $_ => 1 } (
    "Washington", "New York", "New Jersey",
);

my $key;
my %name_to_filename;
while (my $name = <DATA>) {
    chomp $name;
    next if $is_city{$name};
    if ($name =~ /[.]/) {
        $name_to_filename{$key} = $name;
        next;
    }
    $key = $name;
}

use Data::Dumper;
print Dumper \%name_to_filename;


__DATA__
Tom
Washington
account.txt
Gary
New York
accountbalance.png
Mary
New Jersey
Michelle
Larry
Charles
Washington
Real.cpp
Chas. Owens
The text file is large..I cant hardcode values like alabama ......ITs very large...Anyways thanks a lot for the effort...
Sreeja
You can't do it then. Try to get the data in a better format.
Chas. Owens
A: 

Assuming that all filenames have a . in them, and that filenames are the only thing that does.

Also assuming that the list of Cities, and States is so large as to be infeasible to get an entire list.

#! /usr/bin/env perl
use strict;
use warnings;

my @state_city_or_person;
my %files;

while(<>){
  chomp;
  if( index($_,'.') >= 0 ){
    push @{ $files{$_} }, @state_city_or_person;
    @state_city_or_person = ();
  }else{
    push @state_city_or_person, $_;
  }
}

use YAML;

print Dump \%files;
---
Real.cpp:
  - Mary
  - New Jersey
  - Michelle
  - Larry
  - Charles
  - Washington
account.txt:
  - Tom
  - Washington
accountbalance.png:
  - Gary
  - New York

You will still have to go through and remove any extraneous data, like cities, and states, but this should help you to get it into an actual parse-able format.

It would be helpful if there was some sort of structure to the data to start with.

Brad Gilbert