tags:

views:

50

answers:

3

I'm trying to download some remote pages. In the source code there is a very long line. Both curl and wget download the file but decide to miss out this one line. Is there another command line utility I can use and/or does anyone know how I can fix this problem.

Edit: Can I clarify, I have tried with wget and curl and both files miss the line.

Edit:

[x@x scripts]$ curl --version
curl 7.15.5 (x86_64-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
Protocols: tftp ftp telnet dict ldap http file https ftps 
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz 
[x@x scripts]$ wget --version
GNU Wget 1.11.4 Red Hat modified

Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html&gt;.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Originally written by Hrvoje Niksic <[email protected]>.
Currently maintained by Micah Cowan <[email protected]>.
A: 

Why not use curl OR wget ? Both are great tools for that !

Guillaume Lebourgeois
A: 

Write version of wget/curl. What is length of that line?

Gadolin
see comment. thanks.
Simon
+1  A: 

There are two probable explanations for what's happening:

  1. The server looks at the user agent and decides not to include this line. This is the less likely of the two, but wget allows you to change the user agent string, so you should be able to work around it easily.
  2. The long line is constructed on the client, using JavaScript. This is far more likely, but unfortunately for you, not easy to replicate in a command-line environment.

To verify, use a tool such as Fiddler to look at what's actually coming over the wire.

Anon
I am viewing the unrendered source code of the page.
Simon
And how are you doing that? If you're loading the page with your browser and then selecting "View Source" from the menu, you're *not* seeing the raw bytes coming from the server.
Anon