tags:

views:

115

answers:

4

So, for a CS project I'm supposed to sniff a network stream and build files from that stream. For example, if the program is pointed to ~/dumps/tmp/ then the directory structure would be this:

~/dumps/tmp /192.168.0.1/ page1.html page2.html [various resources for pages1 & 2] downloaded file1 /192.168.0.2/ so on and so forth.

I'm doing this in C & pcap on linux (since I already know C++, and figure the learning experience would be good).

Thus far, I've been looking at various header formats for TCP/IP

TCP header

As I figure, I can sort the packets by their dst/src and then order them correctly by sequence and acknowledgement windows.

But that leaves me with a big ? as to how do I figure out how packets a-z are part of an html file and A-Z part of some random file being downloaded etc?

Also, what other kind of header formats should I be looking up? Currently, I have:

I'd post more hyperlink pictures, but I apparently need reputation to do that, sorry TCP, Ethernet, UDP, and I'll get around to things like FTP (but I'm pretty sure FTP is built on top of TCP, as is HTTP)

So, in short, how do I find files in a network stream, and am I missing any major protocols that I'll need to be able to read?

REPLY I can't figure out how to reply, so this will have to do.

I have used pcap on several occasions, and will do so again for this project, but I won't use any of Wiresharks stuff (although it is a great program) because I want to no kidding learn this kind of stuff.

Yeah, I'll look into the OSI layer, any suggestions on a good site that covers common protocols?

And I guess I should stop, before this 'question' becomes a discussion.

+4  A: 

Where a file begins and ends is not in TCP. You have to deal with the protocol carried over TCP. For example, for HTTP, you have to read the Content-Length header in the HTTP header, which should be equal to the length of the HTTP body (the full html page). Then you accumulate the body over 1 or more TCP packets until you have the total content, as indicated by the Content-Length header.

RichAmberale
+2  A: 

Since this is a school assignment, you may be limited as to what tools you can use, but you might want to look into Wireshark. If I were given this task as a real-world project, I'd take Wireshark and look into how to use its stream extraction and protocol parsing capabilities and just wrap something around them to automate them and get the desired result.

swillden
Yeah, or tshark with the proper command-line arguments. Even tshark piped to grep is pretty darned powerful. But yeah, as a student, your job is to stop and smell the flowers.
jhs
Great idea to use Wireshark to view the real world traffic. That will give you some great insight.
RichAmberale
A: 

As this is for CS school, I would start with the OSI Model which gives you a good overview and logical structure of network protocols.
Files are on level 6 (MIME) and 7 (various). Then you need to go through each protocol and check how to determine which contain files and how you can capture them.

weismat
+1  A: 

You need to open a raw socket over a promiscuous Ethernet device. Then use libpcap to store and analyze the packets.

eyalm