ansaurus

Question

Answer 1

+1 A:

Try something like this:

(exec 3<>/dev/tcp/$hostname/$port
 echo -e "GET $path HTTP/1.1\r\nConnection: close\r\n\r\n" >&3
 cat <&3) > $output

Updated for Mike Ottum's bug fix.

DigitalRoss 2009-12-01 21:50:14

Please say that we can reuse your script. Or you cannot win.

Andrea Francia 2009-12-01 21:51:00

You actually have to talk HTTP, you don't. Keywords: Splitted data, 3xx, 4xx, 5xx messages, etc etcI hacked an IRC bot in Bash, but IRC is not as complicate as HTTP can be.

TheBonsai 2009-12-01 21:59:54

Look, it was free, it does fetch files, and it gets the OP 95% of what was required. I'm sorry it wasn't good enough for you, but the SO guidelines say to downvote misleading or incorrect information. We have a relatively sophisticated OP who certainly can understand the limitations of this answer, so it isn't misleading or incorrect. Sheesh.

DigitalRoss 2009-12-01 22:32:57

@Andrea: posts to SO are covered by a license that most likely gives you exactly what you need. See the bottom of the page you are looking at right now.

DigitalRoss 2009-12-01 22:34:16

You're right for the downvote. But the 95% is worth a discussion.

TheBonsai 2009-12-01 22:38:28

HTTP headers must be separated by \r\n, not just \n, and the header should end with a pair of \r\n's. Like this:GET / HTTP/1.1\r\nConnection: close\r\n\r\n

Mike Ottum 2009-12-01 22:39:18

Sorry, I missed the thing about the SO license. I didn't downvoted you. Unfortunately your script doesn't remove the http metadata.

Andrea Francia 2009-12-01 22:40:08

Maybe something with paremeter expansion will help to remove the headers.

Andrea Francia 2009-12-01 22:41:04

I think the right way to remove the headers is to parse the output file in bash and compute the number of bytes to strip, then output a `dd(1)` command to do the binary-safe heavy lifting. However, the OP didn't want any external commands so I'm kind of stuck...

DigitalRoss 2009-12-01 22:47:11

This tries remove almost the headers: echo "${raw_output#HTTP*Content-Type: text/html; charset=iso-8859-1}"

Andrea Francia 2009-12-01 22:50:12

Ok, I've got something that seems to remove the headers. It will only work for things that the shell can read, so it needs a text-friendly encoding. (Yes, limitations.) See http://pastie.org/722698

DigitalRoss 2009-12-01 23:05:18

I hope that shell is more binary friendly that we know.

Andrea Francia 2009-12-01 23:12:04

This is my attempt to put it all together: http://pastie.org/722716

Andrea Francia 2009-12-01 23:20:52

Unfortunately this gives to me HTTP/1.1 400 Bad Request, you should add a "Host: $hostname".

Andrea Francia 2009-12-01 23:43:29

Oh right, in 1.1 the hostname is required. Just make it an HTTP 0.9 request, or add the hostname. That crossed my mind at first then I got distracted...

DigitalRoss 2009-12-01 23:47:32

I found how remove the headers: `"${raw_output#*$'\r\n\r\n'}"`, thanks to you for illustrating me the use of `$'\r'`.

Andrea Francia 2009-12-02 00:12:50

Wow. I'll be kind of amazed if this really works. Let us know!

DigitalRoss 2009-12-03 05:14:02

Answer 2

A:

Thanks to DigitalRoss, Mike Ottum and the other contributors I created the following that does 99% of the works.

I used the parameter expansion to remove headers. The problem is the last newline character of the page. This depends of the usage of the $() construct and I think that this problem couldn't be solved.

function download() {
    local hostname="$1"
    local port="$2"
    local path="$3"

    raw_output="$(download_raw "$hostname" "$port" "$path")"

    # strip the headers
    echo -n "${raw_output#*$'\r\n\r\n'}"
}

function download_raw() {
    local hostname="$1"
    local port="$2"
    local path="$3"

    (exec 3<>/dev/tcp/$hostname/$port
     echo -en "GET $path HTTP/1.1\r\nConnection: close\r\nHost: $hostname\r\n\r\n" >&3
     cat <&3)
}

hostname=andreafrancia.it
port=80
path=/

download "$hostname" "$port" "$path" > output.txt
wget http://"$hostname:$port/$path" -O output.expected
diff --binary output.txt output.expected

The result is:

[root@localhost ~]# diff --binary output.txt output.expected
74c74,75
< </html>
\ No newline at end of file
---
> </html>
>

Feel free to reuse and improve this solution.

Andrea Francia 2009-12-02 00:13:45

Wow. I didn't totally count on this working out. :-) Nice job.

DigitalRoss 2009-12-03 05:15:16

ansaurus

tags:

views:

answers:

How to implement a web client in bash

related questions