The only reliable method that I a have found for using a script to download text from wikipedia is with cURL. So far the only way I have for doing that is to call os.system()
. Even though the output appears properly in the python shell I can't seem to the function it to return anything other than the exit code(0
). Alternately somebody could show be how to properly use urllib
.
views:
540answers:
3
+2
A:
Answering the question, Python has a subprocess module which allows you to interact with spawned processes.http://docs.python.org/library/subprocess.html#subprocess.Popen
It allows you to read the stdout for the invoked process, and even send items to the stdin.
however as you said urllib is a much better option. if you search stackoverflow i am sure you will find at least 10 other related questions...
Cipher
2008-12-09 00:55:36
+6
A:
From Dive into Python:
import urllib
sock = urllib.urlopen("http://en.wikipedia.org/wiki/Python_(programming_language)")
htmlsource = sock.read()
sock.close()
print htmlsource
That will print out the source code for the Python Wikipedia article. I suggest you take a look at Dive into Python for more details.
Example using urllib2 from the Python Library Reference:
import urllib2
f = urllib2.urlopen('http://www.python.org/')
print f.read(100)
Edit: Also you might want to take a look at wget.
Edit2: Added urllib2 example based on S.Lott's advice
Sean
2008-12-09 01:01:28
Thank you, the built in help browser is almost never understandable.
GameFreak
2008-12-09 01:29:18
urllib2 does almost the same thing, plus it handles things like redirects more gracefully.
S.Lott
2008-12-09 01:43:10
@S.Lott I agree. I was just looking for a resource that GameFreak could learn more from, not just copy code from, and it turned out that the first resource I thought of, Dive into Python, used urllib.
Sean
2008-12-09 01:53:08
http://www.python.org/doc/2.5.2/lib/urllib2-examples.html seem pretty clear.
S.Lott
2008-12-09 01:55:14