tags:

views:

292

answers:

5

If I have one file FOO_1.txt that contains:

FOOA

FOOB

FOOC

FOOD

...

and a lots of other files FOO_files.txt. Each of them contains:

1110000000...

one line that contain 0 or 1 as the number of FOO1 values (fooa,foob, ...)

Now I want to combine them to one file FOO_RES.csv that will have the following format:

FOOA,1,0,0,0,0,0,0...

FOOB,1,0,0,0,0,0,0...

FOOC,1,0,0,0,1,0,0...

FOOD,0,0,0,0,0,0,0...

...

What is the simple & elegant way to conduct that (with hash & arrays -> $hash{$key} = \@data ) ?

Thanks a lot for any help !

Yohad

+1  A: 

If I understand correctly your first file is your key order file, and the remaining files each contain a byte per key in the same order. You want a composite file of those keys with each of their data bytes listed together.

In this case you should open all the files simultaneously. Read one key from the key order file, read one byte from each of the data files. Output everything as you read it to you final file. Repeat for each key.

Adam Luter
My thoughts exactly.
Byron Whitlock
A: 

You don't really need to use a hash. My Perl is a little rusty, so syntax may be off a bit, but basically do this:

open KEYFILE , "foo_1.txt" or die "cannot open foo_1 for writing";
open VALFILE , "foo_files.txt" or die "cannot open foo_files for writing";
open OUTFILE , ">foo_out.txt"or die "cannot open foo_out for writing";

my %output;
while (<KEYFILE>) {
    my $key = $_;
    my $val = <VALFILE>;
    my $arrVal = split(//,$val);

    $output{$key} = $arrVal;
    print OUTFILE $key."," . join(",", $arrVal)
}

Edit: Syntax check OK

Comment by Sinan: @Byron, it really bothers me that your first sentence says the OP does not need a hash yet your code has %output which seems to serve no purpose. For reference, the following is a less verbose way of doing the same thing.

#!/usr/bin/perl

use strict;
use warnings;

use autodie qw(:file :io);

open my $KEYFILE, '<', "foo_1.txt";
open my $VALFILE, '<', "foo_files.txt";
open my $OUTFILE, '>', "foo_out.txt";

while (my $key = <$KEYFILE>) {
    chomp $key;
    print $OUTFILE join(q{,}, $key, split //, <$VALFILE> ), "\n";
}
__END__
Byron Whitlock
@Byron: Your code won't compile, and it doesn't do what you think. There are required commas missing in the calls to `open`, and without specification `open` will always open a file for reading. So, none of these are filehandles for _writing_.
Telemachus
Like I said, my perl is rusty, I was only trying to make a point about the algorithm. If the reader can't figure out how to make it comple, I think Sinan's original comment is right, the poster is in over thier head.
Byron Whitlock
@Byron and @Telemachus I did try to modify Byron's code but then I decided I was changing too much and rolled back.
Sinan Ünür
@Sinan - I saw the edit/rollback sequence and figured. @Byron: putting "my Perl is a bit rusty" isn't, in my mind, an excuse for leaving it wrong once you know it's wrong. It's your post, and you may disagree. I wrote the comment partly for you, and partly for the OP (or anyone who wanders by via Google).
Telemachus
I do disagree, as it is only a 2 character change. I didn't fix it as it seemed like a homework question, and it annoys me when posters want all the work work done for them in a nice neat package. If you have ever coded perl, you will hit all the inevitable syntax errors and one needs to be somewhat good at fixing these. But in the context of the original question, that is neither here nor there so I fixed it. Dang, programmers can be so anal sometimes [myself included :-) ]
Byron Whitlock
Speaking of being a**l, s/perl/Perl/ above. http://faq.perl.org/perlfaq1.html#What_s_the_differenc
Sinan Ünür
+1  A: 

Your specifications aren't clear. You couldn't have a "lots of other files" named FOO_files.txt, because it's only one name. So I'm going to take this as the files-with-data + filelist pattern. In this case, there are files named FOO*.txt, each containing "[01]+\n".

Thus the idea is to process all the files in the filelist file and to insert them all into a result file FOO_RES.csv, comma-delimited.

use strict;
use warnings;
use English qw<$OS_ERROR>;
use IO::Handle;

open my $foos, '<', 'FOO_1.txt'
    or die "I'm dead: $OS_ERROR";
@ARGV = sort map { chomp; "$_.txt" } <$foos>;
$foos->close;

open my $foo_csv, '>', 'FOO_RES.csv'
    or die "I'm dead: $OS_ERROR";

while ( my $line = <> ) { 
    my ( $foo_name ) = ( $ARGV =~ /(.*)\.txt$/ );
    $foo_csv->print( join( ',', $foo_name, split //, $line ), "\n" );
}

$foo_csv->close;
Axeman
I am not sure why `use English;` is considered an improvement.
Sinan Ünür
I'm not sure why you think that's the only thing different about my offering.
Axeman
@Axeman I would not have upvoted your answer if I thought that. Note that my comment uses passive voice: There seem to be quite a few people who think `$OS_ERROR` is better than `$!`. I disagree. That's all. Incidentally, I would have used `File::Slurp` for the first part: `@ARGV = sort map { chomp; "$_.txt" } read_file 'FOO_1.txt';`
Sinan Ünür
Unrelated question: Does $file_handle->close work on filehandles opened via a regular `open` (as opposed to one opened via `IO::Handle`)?
Telemachus
I put that wrong: it's not a question of how what `open` you use, but of importing `IO::Handle`.
Telemachus
@Telemachus: It works every time I've tried it. I would use the IO::File constructor, but for some reason, the standard pure-perl IO classes have never implemented the three-arg open. But the C-level IO handles are object-ready. $foos->opened() works just as well as if I blessed it (under 5.10).
Axeman
@Sinan: I was thinking F:S, but I thought that would just add more dependencies. But thanks for the upvote. As for $OS_ERROR, I can only say, it's in PBP and I've adopted English as part of my personal BP--I do think it's slightly more readable, but it's really a case of consistency: $LIST_SEPARATOR is quite a bit more readable than $", so I'm consistent, as long as it doesn't become counter-productive (e.g., using $ARG or @ARG for $_ or @_).
Axeman
@Axeman: that's odd. When I run your script (adjusting for names of files), I get this error: Can't locate object method "close" via package "IO::Handle", but if I add `use IO::Handle;` up at the top, then all is fine. (I'm using 5.10 as well.)
Telemachus
@Telemachus: same here. Besides, I am too used to `open`, `close` etc as functions.
Sinan Ünür
You normally need to load IO::Handle if you use a normal filehandle with methods. The nice thing about using IO::Handle methods, is you get a readable way to set a handle as autoflush or non-blocking: $fh->blocking(0); $fh->autoflush(1); No having to select $fh and then set $| and then select STDOUT again. You can also use IO::File methods where appropriate.
daotoad
I changed it, because I can see from a dump of %INC it's being included, but I have a simplified proof-of-concept architecture that I have built up, and I couldn't find where it was being included.
Axeman
+1  A: 

It looks like you have many foo_files that have 1 line in them, something like:

1110000000

Which stands for

fooa=1
foob=1
fooc=1
food=0
fooe=0
foof=0
foog=0
fooh=0
fooi=0
fooj=0

And it looks like your foo_res is just a summation of those values? In that case, you don't need a hash of arrays, but just a hash.

my @foo_files = (); #NOT SURE HOW YOU POPULATE THIS ONE
my @foo_keys = qw(a b c d e f g h i j);
my %foo_hash = map{ ( $_, 0 ) } @foo_keys; # initialize hash
foreach my $foo_file ( @foo_files ) {
  open( my $FOO, "<", $foo_file) || die "Cannot open $foo_file\n";
  my $line = <$FOO>;
  close( $FOO );
  chomp($line);
  my @foo_values = split(//, $line);
  foreach my $indx ( 0 .. $#foo_keys ) {
    last if ( ! $foo_values[ $indx ] ); # or some kind of error checking if the input file doesn't have all the values
    $foo_hash{ $foo_keys[$indx] } += $foo_values[ $indx ];
  }
}

It's pretty hard to understand what you are asking for, but maybe this helps?

BrianH
+2  A: 

If you can't describe a your data and your desired result clearly, there is no way that you will be able to code it--taking on a simple project is a good way to get started using a new language.

Allow me to present a simple method you can use to churn out code in any language, whether you know it or not. This method only works for smallish projects. You'll need to actually plan ahead for larger projects.

How to write a program:

  1. Open up your text editor and write down what data you have. Make each line a comment
  2. Describe your desired results.
  3. Start describing the steps needed to change your data into the desired form.

Numbers 1 & 2 completed:

#!/usr/bin perl
use strict;
use warnings;

# Read data from multiple files and combine it into one file.
# Source files:
#    Field definitions: has a list of field names, one per line.
#    Data files:  
#      * Each data file has a string of digits.
#      * There is a one-to-one relationship between the digits in the data file and the fields in the field defs file.
# 
# Results File:
# * The results file is a CSV file.
# * Each field will have one row in the CSV file.
# * The first column will contain the name of the field represented by the row.
# * Subsequent values in the row will be derived from the data files.
# * The order of subsequent fields will be based on the order files are read.
# * However, each column (2-X) must represent the data from one data file.

Now that you know what you have, and where you need to go, you can flesh out what the program needs to do to get you there - this is step 3:

You know you need to have the list of fields, so get that first:

# Get a list of fields.
#   Read the field definitions file into an array.

Since it is easiest to write CSV in a row oriented fashion, you will need to process all your files before generating each row. So you'll need someplace to store the data.

# Create a variable to store the data structure.

Now we read the data files:

# Get a list of data files to parse
# Iterate over list

# For each data file:
#    Read the string of digits.
#    Assign each digit to its field.
#    Store data for later use.

We've got all the data in memory, now write the output:

# Write the CSV file.
# Open a file handle.

# Iterate over list of fields
# For each field
#   Get field name and list of values.
#   Create a string - comma separated string with field name and values  
#   Write string to file handle

# close file handle.

Now you can start converting comments into code. You could have anywhere from 1 to 100 lines of code for each comment. You may find that something you need to do is very complex and you don't want to take it on at the moment. Make a dummy subroutine to handle the complex task, and ignore it until you have everything else done. Now you can solve that complex, thorny sub-problem on it's own.

Since you are just learning Perl, you'll need to hit the docs to find out how to do each of the subtasks represented by the comments you've written. The best resource for this kind of work is the list of functions by category in perlfunc. The Perl syntax guide will come in handy too. Since you'll need to work with a complex data structure, you'll also want to read from the Data Structures Cookbook.

You may be wondering how the heck you should know which perldoc pages you should be reading for a given problem. An article on Perlmonks titled How to RTFM provides a nice introduction to the documentation and how to use it.

The great thing, is if you get stuck, you have some code to share when you ask for help.

daotoad
why are you patronizing him? Do you think he doesn't even know how to program?
hhafez
Hey daotoad, Thank you! that will help me see things more clearly! Although I know how to program, I always have new things to learn!
YoDar
@hhafez, Based on his questions, Yohad is having problems breaking down his problems into workable elements. In addition to being unable to describe his goals, in three posts Yohad showed one line of code-a print statement. Further, the questions are basic and can be easily found in the docs. This indicates someone who needs help in the fundamentals. I offered advice about how to approach a problem that took me years to arrive at (yep, I'm that dumb!). I also provided a guide to Perl's huge supply of docs. In my experience, attention to the basics is the path to mastery.
daotoad
Yohad, I am glad you found my post helpful. After decades of practice, I still can't say that I know how to program. Every project is a bit different and demands something new. I've learned a few tricks that help me write software, but I'm still a long way from a generalized solution :)
daotoad