ansaurus

Question

Python html output (first attempt), several questions (code included)

Answer 1

+1 A:

The issue that you will run into is that you will need to change the python whenever you want to change the html. For this case, that might be fine, but it can lead to trouble. I think using something with a template system makes a lot of sense. I would suggest looking at Django. The tutorial is very good.

Buddy 2009-04-28 19:35:34

Answer 2

+4 A:

I'd suggest using a template to generate the output, you can start with the buildin string.Template or try something fancier, for example Mako (or Cheetah, Genshi, Jinja, Kid, etc).

Python has many nice web frameworks, the smallest of them would be web.py or werkzeug

If you want a fullblown framework, look at Pylons or Django but these are really overkill for a project like that.

THC4k 2009-04-28 19:40:57

Answer 3

+8 A:

It would not be overkill to use a framework for something like this; python frameworks tend to be very light and easy to work with, and would make it much easier for you to add features to your tiny site. But neither is it required; I'll assume you're doing this for learning purposes and talk about how I would change the code.

You're doing templating without a template engine in your WebOutput function; there are all kinds of neat template languages for python, my favorite of which is mako. If the code in that function ever gets hairier than it is currently, I would break it out into a template; I'll show you what that would look like in a moment. But first, I'd use multiline strings to replace all those f.write's, and string substitution instead of adding strings:

f.write("""<html>
<title>python newb's twitter search</title>
<head><meta http-equiv='refresh' content='60'></head>
<body>
<h1 style='font-size:150%'>Python Newb's Twitter Search</h1>
<h2 style='font-size:125%'>Searching Twitter for: %s</h2>
<h2 style='font-size:125%'>%s (updates every 60 seconds)</h2>""" % (query, time.ctime()))

for datum in reversed(data):
    f.write("<p style='font-size:90%'>%s</p>" % (datum))

f.write("</body></html>")

Also note that I simplified your for loop a bit; I'll explain further if what I put doesn't make sense.

If you were to convert your WebOutput function to Mako, you would first import mako at the top of your file with:

import mako

Then you would replace the whole body of WebOutput() with:

f = file("outw.html", "w")
data = reversed(data)
t = Template(filename='/path/to/mytmpl.txt').render({"query":query, "time":time.ctime(), "data":data})
f.write(t)

Finally, you would make a file /path/to/mytmpl.txt that looks like this:

<html>
<title>python newb's twitter search</title>
<head><meta http-equiv='refresh' content='60'></head>
<body>
<h1 style='font-size:150%'>Python Newb's Twitter Search</h1>
<h2 style='font-size:125%'>Searching Twitter for: ${query}</h2>
<h2 style='font-size:125%'>${time} (updates every 60 seconds)</h2>

% for datum in data:
    <p style'font-size:90%'>${datum}</p>
% endfor

</body>
</html>

And you can see that the nice thing you've accomplished is separating the output (or "view layer" in web terms) from the code that grabs and formats the data (the "model layer" and "controller layer"). This will make it much easier for you to change the output of your script in the future.

(Note: I didn't test the code I've presented here; apologies if it isn't quite right. It should basically work though)

llimllib 2009-04-28 20:00:40

I think reversed(data[-rpp:]) can be simplified to reversed(data) (rpp is how many items the API returns)

dbr 2009-04-28 23:07:15

good point.... I copied that bit of his range() without thinking. Fixed.

llimllib 2009-04-29 13:42:45

Answer 4

+5 A:

String formatting can make things a lot neater, and less error-prone.

Simple example, %s is replaced by a title:

my_html = "<html><body><h1>%s</h1></body></html>" % ("a title")

Or multiple times (title is the same, and now "my content" is displayed where the second %s is:

my_html = "<html><body><h1>%s</h1>%s</body></html>" % ("a title", "my content")

You can also use named keys when doing %s, like %(thekey)s, which means you don't have to keep track of which order the %s are in. Instead of a list, you use a dictionary, which maps the key to a value:

my_html = "<html><body><h1>%(title)s</h1>%(content)s</body></html>" % {
    "title": "a title",
    "content":"my content"
}

The biggest issue with your script is, you are using a global variable (data). A much better way would be:

call search_results, with an argument of "swineflu"
search_results returns a list of results, store the result in a variable
call WebOutput, with the search results variable as the argument
WebOutput returns a string, containing your HTML
write the returned HTML to your file

WebOutput would return the HTML (as a string), and write it to a file. Something like:

results = SearchResults("swineflu", 25)
html = WebOutput(results)
f = open("outw.html", "w")
f.write(html)
f.close()

Finally, the twitterd module is only required if you are accessing data that requires a login. The public timeline is, well, public, and can be accessed without any authentication, so you can remove the twitterd import, and the api = line. If you did want to use twitterd, you would have to do something with the api variable, for example:

api = twitterd.Api(username='username', password='password')
statuses = api.GetPublicTimeline()

So, the way I might have written the script is:

import time
import urllib
import simplejson

def search_results(query, rpp = 25): # 25 is default value for rpp
    url = "http://search.twitter.com/search.json?q=%s&amp;%s" % (query, rpp)

    jsonResults = simplejson.load(urllib.urlopen(url))

    data = [] # setup empty list, within function scope
    for tweet in jsonResults["results"]:
        # Unicode!
        # And tweet is a dict, so we can use the string-formmating key thing
        data.append(u"%(from_user)s | %(text)s" % tweet)

    return data # instead of modifying the global data!

def web_output(data, query):
    results_html = ""

    # loop over each index of data, storing the item in "result"
    for result in data:
        # append to string
        results_html += "    <p style='font-size:90%%'>%s</p>\n" % (result)

    html = """<html>
    <head>
    <meta http-equiv='refresh' content='60'>
    <title>python newb's twitter search</title>
    </head>
    <body>
        <h1 style='font-size:150%%'>Python Newb's Twitter Search</h1>
        <h2 style='font-size:125%%'>Searching Twitter for: %(query)s</h2>
        <h2 style='font-size:125%%'> %(ctime)s (updates every 60 seconds)</h2>
    %(results_html)s
    </body>
    </html>
    """ % {
        'query': query,
        'ctime': time.ctime(),
        'results_html': results_html
    }

    return html


def main():
    query_string = "swineflu"
    results = search_results(query_string) # second value defaults to 25

    html = web_output(results, query_string)

    # Moved the file writing stuff to main, so WebOutput is reusable
    f = open("outw.html", "w")
    f.write(html)
    f.close()

    # Once the file is written, display the output to the terminal:
    for formatted_tweet in results:
        # the .encode() turns the unicode string into an ASCII one, ignoring
        # characters it cannot display correctly
        print formatted_tweet.encode('ascii', 'ignore')


if __name__ == '__main__':
    main()
# Common Python idiom, only runs main if directly run (not imported).
# Then means you can do..

# import myscript
# myscript.search_results("#python")

# without your "main" function being run

(2) at what point would a framework be appropriate for an app like this? overkill?

I would say always use a web-framework (with a few exceptions)

Now, that might seem strange, given all time I just spent explaining fixes to your script.. but, with the above modifications to your script, it's incredibly easy to do, since everything has been nicely function'ified!

Using CherryPy, which is a really simple HTTP framework for Python, you can easily send data to the browser, rather than constantly writing a file.

This assumes the above script is saved as twitter_searcher.py.

Note I've never used CherryPy before, this is just the HelloWorld example on the CherryPy homepage, with a few lines copied from the above script's main() function!

import cherrypy

# import the twitter_searcher.py script
import twitter_searcher
# you can now call the the functions in that script, for example:
# twitter_searcher.search_results("something")

class TwitterSearcher(object):
    def index(self):
        query_string = "swineflu"
        results = twitter_searcher.search_results(query_string) # second value defaults to 25
        html = twitter_searcher.web_output(results, query_string)

        return html
    index.exposed = True

cherrypy.quickstart(TwitterSearcher())

Save and run that script, then browse to http://0.0.0.0:8080/ and it'll show your page!

The problem with this, on every page load it will query the Twitter API. This will not be a problem if it's just you using it, but with hundreds (or even tens) of people looking at the page, it would start to slow down (and you could get rate-limited/blocked by the twitter API, eventually)

The solution is basically back to the start.. You would write (cache) the search result to disc, re-searching twitter if the data is more than ~60 seconds old. You could also look into CherryPy's caching options.. but this answer is getting rather absurdly long..

dbr 2009-04-28 20:44:15

ansaurus

tags:

views:

answers:

Python html output (first attempt), several questions (code included)

related questions