ansaurus

Question

Answer 1

+10 A:

Using python:

totals = [ int(i)+int(j) for i, j in zip ( open(fname1), open(fname2) ) ]

SilentGhost 2009-08-28 14:46:28

That is Python :)

2009-08-28 14:54:03

Python was mentioned in the question as OK for a solution.

unwind 2009-08-28 14:55:29

You are right, I've skipped the tags.

2009-08-28 14:57:24

It would help if you told people what it was, you can't just type that on the commandline, I only got that it was Python because it used a list comprehension.

Chas. Owens 2009-08-28 15:00:05

I added one missing bracket to the code. - I have not yet managed to get the code to work. **Do you need any library to get it work?** - I get this error `coercing to Unicode: need string or buffer, int foundWARNING: Failure executing file:` when running the command with my filenames.

Masi 2009-08-28 16:41:55

Worked beautifully for me using Python 2.5. No library required.

Fred Larson 2009-08-28 18:03:42

Answer 2

+3 A:

You can avoid the intermediate steps by just using a command that do the counts and the comparison at the same time:

find . -type f -exec perl -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_}++ for /([}{])/g' {}\;

This calls the Perl program once per file, the Perl program counts the number of each type curly brace and prints the name of the file if they counts don't match.

You must be careful with the /([}{]])/ section, find will think it needs to do the replacement on {} if you say /([{}]])/.

WARNING: this code will have false positives and negatives if you are trying to run it against source code. Consider the following cases:

balanced, but curlies in strings:

if ($s eq '{') {
    print "I saw a {\n"
}

unbalanced, but curlies in strings:

while (1) {
   print "}";

You can expand the Perl command by using B::Deparse:

perl -MO=Deparse -nle 'END { print $ARGV if $h{"{"} != $h{"}"} } $h{$_}++ for /([}{])/g'

Which results in:

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    sub END {
        print $ARGV if $h{'{'} != $h{'}'};
    }
    ;
    ++$h{$_} foreach (/([}{])/g);
}

We can now look at each piece of the program:

BEGIN { $/ = "\n"; $\ = "\n"; }

This is caused by the -l option. It sets both the input and output record separators to "\n". This means anything read in will be broken into records based "\n" and any print statement will have "\n" appended to it.

LINE: while (defined($_ = <ARGV>)) {
}

This is created by the -n option. It loops over every file passed in via the commandline (or STDIN if no files are passed) reading each line of those files. This also happens to set $ARGV to the last file read by <ARGV>.

chomp $_;

This removes whatever is in the $/ variable from the line that was just read ($_), it does nothing useful here. It was caused by the -l option.

sub END {
    print $ARGV if $h{'{'} != $h{'}'};
}

This is an END block, this code will run at the end of the program. It prints $ARGV (the name of the file last read from, see above) if the values stored in %h associated with the keys '{' and '}' are equal.

++$h{$_} foreach (/([}{])/g);

This needs to be broken down further:

/
    (    #begin capture
    [}{] #match any of the '}' or '{' characters
    )    #end capture
/gx

Is a regex that returns a list of '{' and '}' characters that are in the string being matched. Since no string was specified the $_ variable (which holds the line last read from the file, see above) will be matched against. That list is fed into the foreach statement which then runs the statement it is in front of for each item (hence the name) in the list. It also sets $_ (as you can see $_ is a popular variable in Perl) to be the item from the list.

++h{$_}

This line increments the value in $h that is associated with $_ (which will be either '{' or '}', see above) by one.

Chas. Owens 2009-08-28 14:53:38

Answer 3

+10 A:

If c1 and c2 are youre files, you can do this:

$ paste c1 c2 | awk '{print $1 + $2}'

Or (without AWK):

$ paste c1 c2 | while read i j; do echo $(($i+$j)); done

2009-08-28 14:53:43

Answer 4

+1 A:

In Python (or Perl, Awk, &c) you can reasonably do it in a single stand-alone "pass" -- I'm not sure what you mean by "too many curly brackets", but you can surely count curly use per file. For example (unless you have to worry about multi-GB files), the 10 files using most curly braces:

import heapq
import os
import re

curliest = dict()

for path, dirs, files in os.walk('.'):
  for afile in files:
    fn = os.path.join(path, afile)
    with open(fn) as f:
      data = f.read()
      braces = data.count('{') + data.count('}')
    curliest[fn] = bracs

top10 = heapq.nlargest(10, curlies, curliest.get)
top10.sort(key=curliest.get)
for fn in top10:
  print '%6d %s' % (curliest[fn], fn)

Alex Martelli 2009-08-28 15:06:39

Answer 5

A:

Reply to Lutz'n answer

My problem was finally solved by this commnad

paste -d: /tmp/1 /tmp/2 | awk -F: '{ print $1 "\t" $2 - $4 }'

Masi 2009-08-28 15:29:00

Answer 6

A:

your problem can be solved with just 1 awk command...

awk '{getline i<"file1";print i+$0}'  file2

ghostdog74 2009-08-30 14:12:44

ansaurus

tags:

views:

answers:

Summing up two columns the Unix way

# To fix the symptom

# Initial situation

related questions