String formatting can make things a lot neater, and less error-prone.
Simple example, %s
is replaced by a title
:
my_html = "<html><body><h1>%s</h1></body></html>" % ("a title")
Or multiple times (title is the same, and now "my content" is displayed where the second %s
is:
my_html = "<html><body><h1>%s</h1>%s</body></html>" % ("a title", "my content")
You can also use named keys when doing %s
, like %(thekey)s
, which means you don't have to keep track of which order the %s
are in. Instead of a list, you use a dictionary, which maps the key to a value:
my_html = "<html><body><h1>%(title)s</h1>%(content)s</body></html>" % {
"title": "a title",
"content":"my content"
}
The biggest issue with your script is, you are using a global variable (data
). A much better way would be:
- call search_results, with an argument of "swineflu"
- search_results returns a list of results, store the result in a variable
- call WebOutput, with the search results variable as the argument
- WebOutput returns a string, containing your HTML
- write the returned HTML to your file
WebOutput would return the HTML (as a string), and write it to a file. Something like:
results = SearchResults("swineflu", 25)
html = WebOutput(results)
f = open("outw.html", "w")
f.write(html)
f.close()
Finally, the twitterd module is only required if you are accessing data that requires a login. The public timeline is, well, public, and can be accessed without any authentication, so you can remove the twitterd import, and the api =
line. If you did want to use twitterd, you would have to do something with the api
variable, for example:
api = twitterd.Api(username='username', password='password')
statuses = api.GetPublicTimeline()
So, the way I might have written the script is:
import time
import urllib
import simplejson
def search_results(query, rpp = 25): # 25 is default value for rpp
url = "http://search.twitter.com/search.json?q=%s&%s" % (query, rpp)
jsonResults = simplejson.load(urllib.urlopen(url))
data = [] # setup empty list, within function scope
for tweet in jsonResults["results"]:
# Unicode!
# And tweet is a dict, so we can use the string-formmating key thing
data.append(u"%(from_user)s | %(text)s" % tweet)
return data # instead of modifying the global data!
def web_output(data, query):
results_html = ""
# loop over each index of data, storing the item in "result"
for result in data:
# append to string
results_html += " <p style='font-size:90%%'>%s</p>\n" % (result)
html = """<html>
<head>
<meta http-equiv='refresh' content='60'>
<title>python newb's twitter search</title>
</head>
<body>
<h1 style='font-size:150%%'>Python Newb's Twitter Search</h1>
<h2 style='font-size:125%%'>Searching Twitter for: %(query)s</h2>
<h2 style='font-size:125%%'> %(ctime)s (updates every 60 seconds)</h2>
%(results_html)s
</body>
</html>
""" % {
'query': query,
'ctime': time.ctime(),
'results_html': results_html
}
return html
def main():
query_string = "swineflu"
results = search_results(query_string) # second value defaults to 25
html = web_output(results, query_string)
# Moved the file writing stuff to main, so WebOutput is reusable
f = open("outw.html", "w")
f.write(html)
f.close()
# Once the file is written, display the output to the terminal:
for formatted_tweet in results:
# the .encode() turns the unicode string into an ASCII one, ignoring
# characters it cannot display correctly
print formatted_tweet.encode('ascii', 'ignore')
if __name__ == '__main__':
main()
# Common Python idiom, only runs main if directly run (not imported).
# Then means you can do..
# import myscript
# myscript.search_results("#python")
# without your "main" function being run
(2) at what point would a framework be appropriate for an app like this? overkill?
I would say always use a web-framework (with a few exceptions)
Now, that might seem strange, given all time I just spent explaining fixes to your script.. but, with the above modifications to your script, it's incredibly easy to do, since everything has been nicely function'ified!
Using CherryPy, which is a really simple HTTP framework for Python, you can easily send data to the browser, rather than constantly writing a file.
This assumes the above script is saved as twitter_searcher.py
.
Note I've never used CherryPy before, this is just the HelloWorld example on the CherryPy homepage, with a few lines copied from the above script's main() function!
import cherrypy
# import the twitter_searcher.py script
import twitter_searcher
# you can now call the the functions in that script, for example:
# twitter_searcher.search_results("something")
class TwitterSearcher(object):
def index(self):
query_string = "swineflu"
results = twitter_searcher.search_results(query_string) # second value defaults to 25
html = twitter_searcher.web_output(results, query_string)
return html
index.exposed = True
cherrypy.quickstart(TwitterSearcher())
Save and run that script, then browse to http://0.0.0.0:8080/
and it'll show your page!
The problem with this, on every page load it will query the Twitter API. This will not be a problem if it's just you using it, but with hundreds (or even tens) of people looking at the page, it would start to slow down (and you could get rate-limited/blocked by the twitter API, eventually)
The solution is basically back to the start.. You would write (cache) the search result to disc, re-searching twitter if the data is more than ~60 seconds old. You could also look into CherryPy's caching options.. but this answer is getting rather absurdly long..