tags:

views:

59

answers:

2

I am running a command which returns 96 .txt files for each hour of a particular date. so finally it gives me 24*96 files for one day in a directory. My aim is to extract data for four months which will result in 30*24*96*4 files in a directory.

After I get the data I need to extract certain "pattern" from each of the files and display that as output.

1) Below script is only for one day where date is hardcoded in the script 2) I need to make it work for all days in a month and I need to run it from june to october 3) As data is huge , my disk will run out of space so I don't want to create these many files instead i just want to grep on the fly and get only one output file .

How can i efficiently do this ?

My shell script looks like this

for R1 in {0..9}; do
  for S1 in {0..95}; do

      echo $R1 $S1

      curl  -H "Accept-Encoding: gzip" "http://someservice.com/getValue?Count=96&data=$S1&fields=hitType,QueryString,pathInfo" | zcat > 20101008-mydata-$R1-$S1.txt
  done
done
  • This returns the files I need.
  • After that, I extract a URL pattern from each of the file grep "test/link/link2" * | grep category > 1. output
A: 

you can use this awk command to get URLs

awk -vRS="</a>" '/href/&&/test.*link2/&&/category/{gsub(/.*<a.*href=\"|\".*/,"");print}' file
ghostdog74
Sorry But I think I didn't made my question clear . I have edited my question . Can you take a look ?
TopCoder
A: 

Here's how to loop over 4 months worth of dates

#!/usr/bin/perl
use strict;
use warnings;
use Date::Simple ':all';

for (my $date = ymd(2010,4,1), my $end = ymd(2010,8,1);$date < $end; $date++) {
    my $YYYYMMDD = $date->format("%Y%m%d");
    process_one_day($YYYYMMDD); # Add more formats if needed as parameters
}

sub process_one_day {
    my $YYYYMMDD = shift;
    # ...
    # ... Insert your code to process that date
    # ... Either call system() command on the sample code in your question
    # ... Or better yet write a native Perl equivalent
    # ...
    # ... For native processing, use WWW::Mechanize to extract the data from the URL
    # ... and Perl's native grep() to grep for it
}
DVK
I didn't provide the code to process 1 day's worth as it seems to me you already know how to do that and it seems you only asked how to loop over a range of dates. If you wish to process 1 days' worth using native Perl code (my recommendation) but need help converting your shell code to Perl, please ask that as a separate question but link to this one.
DVK