views:

55

answers:

2

Hey guys, so I'm making a script to featch words/results off of this site (http://grecni.com/texttwist.php), So I already have the http request post ready, ect.

Only thing I need now is to fetch out the words, So I'm working with an html source that looks like so:

<html>
<head>
<title>Text Twist Unscrambler</title>
<META NAME="keywords" CONTENT="Text,Twist,Text Twist,Unscramble,Free,Source,php">
</head>
<body>

<font face="arial,helvetica" size="3">
<p>
<b>3 letter words</b><br>sae &nbsp; sac &nbsp; ess &nbsp; aas &nbsp; ass &nbsp; sea &nbsp; ace &nbsp; sec &nbsp; <p>

<b>4 letter words</b><br>cess &nbsp; secs &nbsp; seas &nbsp; ceca &nbsp; sacs &nbsp; case &nbsp; asea &nbsp; casa &nbsp; aces &nbsp; caca &nbsp; <p>

<b>5 letter words</b><br>cacas &nbsp; casas &nbsp; caeca &nbsp; cases &nbsp; <p>
<b>6 letter words</b><br>access &nbsp; <br><br>
Found 23 words in 0.22962 seconds


<form action="texttwist.php" method="post">

enter scrambled letters and I'll return all word combinations<br>
<input type="text" name="l" value="asceacas" size="20" maxlength="20">

<input type="submit" name="button" value="unscramble">
<input type="button" name="clear" value="clear" onClick="this.form.l.value='';">
</form><p>

<a href=texttwist.phps>php source</a>
- it's kinda ugly, but it's fast<p>

<a href=/>back to my page</a>

</body>

</html>

I'm trying to fetch the words like "sae", "sav", "secs", "seas", "casas", ect.

Any help?

This is the farthest i've gotten, don't know what to do from here.: link text

Any suggestions? Help?

+1  A: 

Use a HTML parser like Nokogiri.

Adrian
A: 

If you want any kind of robustness you really want a parser, as mentioned by Adrian, Nokogiri is most popular solution.

If you insist, aware of the madness that you may be in for as the page becomes more complex the following may help:

Search for a line that matches

/^<b>\d+ letter words/

and then you can dig out the bits like so:

a = line.split(/<br>/)[1] # the second half
a.gsub!('<p>', '') # take out the trailing <p>
res = a.split(' &nbsp; ')# this is your data

That being said, this isn't anything you want in production code. You'll be surprised how learning a parser will change how you see this problem.

Paul Rubel