views:

102

answers:

4

right now its set up to write to a file, but I want it to output the value to a variable. not sure how.

from BeautifulSoup import BeautifulSoup
import sys, re, urllib2
import codecs


woof1 = urllib2.urlopen('someurl').read()
woof_1 = BeautifulSoup(woof1)
woof2 = urllib2.urlopen('someurl').read()
woof_2 = BeautifulSoup(woof2)

GE_DB = open('GE_DB.txt', 'a')

for row in woof_1.findAll("tr", { "class" : "row_b" }):
  for col in row.findAll(re.compile('td')):
    GE_DB.write(col.string if col.string else '')
GE_DB.write("   ")
GE_DB.write("\n")
GE_DB.close()
for row in woof_2.findAll("tr", { "class" : "row_b" }):
  for col in row.findAll(re.compile('td')):
    GE_DB.write(col.string if col.string else '')
GE_DB.write("\n")
GE_DB.close()
A: 
values = []
for row in woof_1.findAll("tr", { "class" : "row_b" }):
  for col in row.findAll(re.compile('td')):
    if col.string:
      values.append(col.string)
result = ''.join(values)
Li0liQ
I'm getting an invalid syntax for | if (col.string) | on the ) not sure why. =/ Something I did?
Pevo
@Pevo, sorry for that, I've missed a colon after if statement. Corrected it.
Li0liQ
Your correspondent omitted a necessary `:` but included redundant `(` and `)` ;-)
John Machin
can the difference between how this one would work and | Jonathan Feinberg's answer|this one works be explained to me?
Pevo
@John Machin, it's a kind of habit while dealing with other languages. Corrected that also :).
Li0liQ
StringIO (http://docs.python.org/library/stringio.html) is a module that helps you read and write strings like a file therefore it's a natural replacement for the file input/output operations. My solution will be more suitable if you might want to perform some processing on the values (values list particularly) retrieved from the table before merging them into one big string.
Li0liQ
when I use this code my output is [u' <value i want here> '] why is the [u' and '] present?
Pevo
Yes I'm much more interested in this solution. as you read my mind. I'll update my code with what i'd like to do.
Pevo
@Pevo, single quote marks the beginning and the end of the string,u means unicode string. The brackets may indicate that you are looking at the list of strings (i.e. values list), not at the resulting string (i.e. result).
Li0liQ
@li0liQ thank's for the solution. wondering if i add a lot of urls to parse the tables from will I need to worry about any problems?
Pevo
@Pevo, you are welcome. A good manner is to accept the solution that helped you. Well, just don't mix the data from different urls :).
Li0liQ
A: 
henchman
and why was mine downvoted? any feedback appreciated!
henchman
String concatenation like that is generally frowned upon in Python. It's better (style- and efficiency-wise) to build up a list of strings and then `join` them (or, if the OP wanted to continue using file-like objects, use `StringIO`). See http://wiki.python.org/moin/PythonSpeed/PerformanceTips#StringConcatenation and http://www.skymind.com/~ocrow/python_string/ for more.
Will McCutchen
 thank you!
henchman
A: 
import cStringIO as StringIO   # or import StringIO if on a fringe platform
buf = StringIO.StringIO()
for row in woof_1.findAll("tr", { "class" : "row_b" }):
  for col in row.findAll(re.compile('td')):
    buf.write(col.string if col.string else '')

result = buf.getvalue()
Jonathan Feinberg
This one seems to be working!!! thank's a lot =)
Pevo
Who the heck downvoted this, and why?
Jonathan Feinberg
A: 

Get rid of all mentions of GE_DB.

Do a outputtext = "" towards the beginning.

Replace GE_DB.write(col.string if col.string else '') with outputtext += col.string if col.string else ''

prestomation