views:

732

answers:

4

the homework: http://www.cs.rit.edu/~waw/networks/prob1.082.html

Ok, I am still confused why this question was asked for my data communications and networks class, but here is the question from my homework:

  1. Write a computer program that reads the header on an e-mail message and deletes all lines except those that begin with

    From:, To:, Subject: and Cc:.

CONTEST -- Who can write the shortest program that does this.

So after thinking for a bit I decided that the following Perl code was as small as I could do this.

#!/usr/bin/perl

while (<>) { print "$_" if ($_ =~ m/^(To:|From:|Subject:|Cc:)/); }

All this does is act like a filter for which the only output is lines that start with From:, To:, Subject: and Cc: as specified in the question. Since there aren't any specific details I think that the above code works to at least correctly answer the question.

Now, I wonder how small a program could possibly written for this? I can understand if no one wants to post code because they think I will use it for the assignment, but I am more or less looking for suggestions and techniques that could help me write the shortest program possible.

Also, I am quite sure by shortest he is referring to actual code length. He did mention that scripting languages were the way to go so I doubt he is considering something like the overhead involved with an interpreter. This also means that he does not care which language is used.

Thanks for looking!

EDIT: Thanks for the suggestions! I had been reading questions here for quite a while, hopefully in the future I can contribute more. Also, some of the suggestions I trimmed my Perl code down to 55 bytes. I don't think we will need to deal with something like a multi-line header.

BONUS: Who can identify a good reason why this was asked in a class where we are discussing things like packet switching and client/server architectures?

EDIT2: For the record, my professor said that someone did this with something like 55 bytes. The only way I see that as being possible is if he was only asking for a simple implementation like the one above.

+10  A: 

A few tips:

  • print "$_" is equal to print
  • while(<>) {...} can be replaced by adding -n to the options on the #! line/
  • $_ =~ m// is equal to //
  • You're typing four :'s where one is good enough.

Something like

#!/usr/bin/perl -n
print if /^(To|From|Subject|Cc):/;
Leon Timmermans
That does not account for multi-line headers :-)
mat
See section 2.2.3 of RFC 2822.
brian d foy
Fixed that, see my other reply ;-)
Leon Timmermans
ysth
A: 

Well, assuming you have your header in a string with one item per line (To:, From:, etc) named $head, then in Powershell it would be:

$head.Split("`n") | ?{$_ -match "[To|From|Subject|Cc]:"}

EBGreen
I posted this mainly on a lark since I think the OP is looking for a *nix solution, but there it is.
EBGreen
This fails to handle line folding. See section 2.2.3 of RFC 2822.
brian d foy
Specs:"deletes all lines except those that begin with From:, To:, Subject: and Cc:." Erase Brian's downvote (+1).
Axeman
Well, "lines" here mean two different things. An email header is one logical line, but may have more than one physical line. This is clearly noted in RFC 2822. If you want to upvote incorrect and ignorant solutions, I guess you can do that.
brian d foy
A: 

Why are you trying to get the shortest possible program first? Start with a correct solution and then edit it to you can't remove any more. Syntax and typing are not going to be the bottlenecks for a correct solution. Even if your program is longer than anyone else's, if you are the only one who does it correctly, you still win. :)

Read RFC 2822, "Internet Message Format" to see what you have to handle.

Then, look at the existing email parsing libraries that already exist to see the shenanigans that they have to handle. Once you think you have a solution because you follow the RFC, start working on all of the broken mailers.

If you are just trying to get work done, use the right tool. This is a job for formail if you just want to play with messages, but it you have to write tight code that will run on all the messages passing through your network, then something like qsmtp (the mod_perl for MTAs) might be what you want.

As far why you have to do this, what did the instructor say when you asked? You should get into the habit of specifying the desired end state and constraints for any assignment, whether in school or in a "real" job.


Here's a proper program to finish the task correctly. Mine's a bit long because I also read all of the emails from the source (which can be almost any common email storage format such as mbox, maildir, and so on) and I extract just the header from each message. This is only 51 characters:

 formail -s formail -c -XTo: -XFrom: -XCc: <my_inbox

If you'd rather have a Perl solution so you have a little more control over the output, here's that too:

#!/usr/bin/perl

use Email::Folder;

my $folder = Email::Folder->new($ARGV[0]);

foreach my $message ( $folder->messages )
    {
    print
     join "\n",
     map {
      my $h = $message->header( $_ );
      defined $h ? "$_: $h " : ();
      } 
     qw(From To Cc);

    print "\n\n";
    } 
brian d foy
a) it's homework; b) the prof clearly isn't expecting a full implementation of RFC 2822; c) "shortest" is clearly stated in the problem statement
GalacticCowboy
I would say that if the question as stated is a problem then you should see about getting the professor's email address and taking up the discussion there.
EBGreen
Homework does not imply doing it incorrectly. "Shortest" is a bonus part of the problem---it is not the goal.
brian d foy
"1. Write a computer program that reads the header on an e-mail message and deletes all lines except those that begin with From:, To:, Subject: and Cc:." - Standards compliance was not the goal of the homework question. None of the non-standards compliant answers are incorrect.
EBGreen
See RFC 2822 for the definition of an email header line. If you want to be known as the guy who takes shortcuts and doesn't want to do it correctly, well, that's something you need to work on.
brian d foy
The flip side of that is being the person that wastes time building a car when all the boss *explicitly* asked for was a bicycle. I agree that you should always get into the habit of asking for clear problem definitions, but as I say, the answers provided do answer the question that was asked.
EBGreen
As you can see from Leon's updated answer, it's not that hard to do it right. No one is building a car here. Really, are you arguing because you're protecting your ego or you really think that an extra five minutes work is a waste of time? I think it's the former.
brian d foy
I don't know about ego, but there are other ways to make your point than referring to everyone else's answer as incorrect when they actually weren't.
EBGreen
Since we were not even told to submit electronically, I am pretty sure that this is the way to go. All I planned on doing was printing out the code and handing it in with the other 10 or so questions from the book. the homework: http://www.cs.rit.edu/~waw/networks/prob1.082.html
+5  A: 

OK, here's a multi-line matching program:

$/="";$_=<>;print$&while/^(To|From|Subject|Cc):.*\n( .*\n)*/mg

You wanted short, not pretty, right ;-)

Leon Timmermans
Leon Timmermans