ansaurus

Question

Split Twitter RSS string using Python

Answer 1

A:

Twitter has an api that may be easier for you to use here, http://apiwiki.twitter.com/Twitter-API-Documentation.

You can get the results as JSON or XML and use one of the many Python libraries to parse the results.

Or if you must your the RSS there are Python feed parsers like, http://www.feedparser.org/.

Jason Christa 2009-08-30 18:20:54

I appreciate the response, Jason. I am a newbie at APIs as well, so I'm hoping to learn more about them.

Gregory Saxton 2009-08-31 16:19:36

Answer 2

+3 A:

It seems like your data-driven design is rather flawed. Unless all your entries have a text part, an url and up to 4 tags, it's not going to work.

You also need to separate saving to db from parsing. Parsing could be easily done with a regexep (or even string methods):

>>> s = your_string
>>> s.split()
['MiamiPete:', "today's", '"Last', 'Call"', 'is', 'now', 'up', 'http://bit.ly/MGDzu', '#stocks', '#stockmarket', '#finance', '#money']
>>> url = [i for i in s.split() if i.startswith('http://')]
>>> url
['http://bit.ly/MGDzu']
>>> tags = [i for i in s.split() if i.startswith('#')]
>>> tags
['#stocks', '#stockmarket', '#finance', '#money']
>>> ' '.join(i for i in s.split() if i not in url+tags)
'MiamiPete: today\'s "Last Call" is now up'

Single-table db design would probably have to go, though.

SilentGhost 2009-08-30 18:21:40

Thanks, SilentGhost! I really appreciate the detailed response. As you can tell, I'm just starting to learn python. I have a couple of follow-up questions, if you don't mind (see additional post below).

Gregory Saxton 2009-08-31 16:27:54

OK, I've read up on the 1st and 2nd normal forms--great stuff! I think the key lesson for this particular project is that I would want a table with columns for "entry_id" (which is the same rss entry_id as in my core data table) and a second column for "hashtags." This way, there would be multiple rows for each entry_id, if there is more than one hashtag. Does that seem right?

Gregory Saxton 2009-08-31 17:58:28

that sounds about right, don't forget about your url, they're not different from hashtags, so require their own table.

SilentGhost 2009-08-31 18:09:39

Regarding url, is that true even if there is at most 1 url in each rss entry?

Gregory Saxton 2009-08-31 18:30:34

well, if you can rely on such assumption and are ready to deal with consequences, sure, empty string or any other default values will do.

SilentGhost 2009-08-31 18:36:11

OK, point taken. So, having read the material on normalization and deciding to have separate tables for urls and tags, I'm stuck on the next stage of implementation. Specifically, do you have any suggestions on what the code would look like for putting all of the tags in a separate table? I can see that they are all in the "tags" variable you created, but don't have any sample code on the next stage. If you have any links or other material, I'd greatly appreciate it.

Gregory Saxton 2009-08-31 18:57:40

hm, it seems like code your posted already have all you need. You might want to have a closer look at http://docs.python.org/library/sqlite3.html but you have working example already. Just a piece of advice, when you ask a question, try to isolate the problem and be more specific.

SilentGhost 2009-08-31 19:21:51

Answer 3

+1 A:

Also, you can parse your strings using regexps:

>>> s = (u'MiamiPete: today\'s "Last Call" is now up http://bit.ly/MGDzu '
         '#stocks #stockmarket #finance #money')
>>> re.match(r'(.*) (http://[^ ]+)', s).groups()
(u'MiamiPete: today\'s "Last Call" is now up', u'http://bit.ly/MGDzu')
>>> re.findall(r'(#\w+)', s)
[u'#stocks', u'#stockmarket', u'#finance', u'#money']

J.F. Sebastian 2009-08-30 18:41:51

Thanks! I have never used regular expressions but they certainly look promising. Now might be a good time. :)

Gregory Saxton 2009-08-31 16:29:48

Answer 4

A:

I would highly recommend using the Twitter API. There are actually two APIs, one for the main twitter server and one for the search server. They are used for different things.

You can find sample code, pytwitter on svn. Add simplejson and you can be doing very powerful things in a matter of minutes.

Good luck

Trik 2009-08-30 22:59:11

Thanks, Trik. I'll check it out!

Gregory Saxton 2009-08-31 16:18:44

ansaurus

tags:

views:

answers:

Split Twitter RSS string using Python

related questions