ansaurus

Question

How to get the contents of a webpage in a shell variable ?

Answer 1

+7 A:

You can use wget command to download the page and read it into a variable as:

content=$(wget google.com -q -O -)
echo $content

We use the -O option of wget which allows us to specify the name of the file into which wget dumps the page contents. We specify - to get the dump onto standard output and collect that into the variable content. You can add the -q quiet option to turn off's wget output.

You can use the curl command for this aswell as:

content=$(curl -L google.com)
echo $content

We need to use the -L option as the page we are requesting might have moved. In which case we need to get the page from the new location. The -L or --location option helps us with this.

codaddict 2010-09-18 18:46:19

That doesn't answer the "and get it's content *in a variable*" part of the question.

Vivien Barousse 2010-09-18 18:48:04

@Downvoter: care to explain ?

codaddict 2010-09-18 18:55:33

@codaddict: I explained, and your question has been edited since, so my downvote doesn't mean anything anymore... (It actually turned into an upvote).

Vivien Barousse 2010-09-18 19:08:20

Answer 2

+4 A:

There is the wget command or the curl.

You can now use the file you downloaded with wget. Or you can handle a stream with curl.

Resources :

Colin Hebert 2010-09-18 18:47:23

Answer 3

A:

content=`wget -O - $url`

Jim Lewis 2010-09-18 18:55:16

`$(...)` is preferred over ``...``, see http://mywiki.wooledge.org/BashFAQ/082

rjack 2010-09-18 19:18:36

I guess I'm showing my age. Back in the day, all _we_ had were backticks...and we _liked_ it! Now get off my lawn!

Jim Lewis 2010-09-18 19:28:18

@rjack: (But the article you linked to does make a pretty good case for the $(...) syntax.)

Jim Lewis 2010-09-18 19:33:35

Answer 4

+1 A:

You can use curl or wget to retrieve the raw data, or you can use w3m -dump to have a nice text representation of a web page.

$ foo=$(w3m -dump http://www.example.com/); echo $foo
You have reached this web page by typing "example.com", "example.net","example.org" or "example.edu" into your web browser. These domain names are reserved for use in documentation and are not available for registration. See RFC 2606, Section 3.

rjack 2010-09-18 19:13:51

Answer 5

A:

Hi there is many way to get a page in command line... but it also depends if you want the code source or the page itself:

If you need the code source

with curl: curl $url

with wget: wget -O - $url

but if you want to get what you can see with a browser, lynx can be usefull: lynx -dump $url

I think you can find so many solutions for this little problem, maybe you should read all man page for those commands. And don't forget to replace $url by your url :)

Good luck :)

julianvdb 2010-09-18 20:43:20

Answer 6

A:

If you have LWP installed, it provides a binary simply named "GET".

$ GET http://example.com
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
  <META http-equiv="Content-Type" content="text/html; charset=utf-8">
  <TITLE>Example Web Page</TITLE>
</HEAD> 
<body>  
<p>You have reached this web page by typing &quot;example.com&quot;,
&quot;example.net&quot;,&quot;example.org&quot
  or &quot;example.edu&quot; into your web browser.</p>
<p>These domain names are reserved for use in documentation and are not available 
  for registration. See <a href="http://www.rfc-editor.org/rfc/rfc2606.txt"&gt;RFC 
  2606</a>, Section 3.</p>
</BODY>
</HTML>

wget -O-, curl, and lynx -source behave similarly.

ephemient 2010-09-18 20:50:34

ansaurus

tags:

views:

answers:

How to get the contents of a webpage in a shell variable ?

related questions