views:

164

answers:

2

Hello everybody,

I have this html table:

<table>
    <tr>
        <td class="datax">a</td>
        <td class="datax">b</td>
        <td class="datax">c</td>
        <td class="datax">d</td>
    </tr>
    <tr>
        <td class="datax">e</td>
        <td class="datax">f</td>
        <td class="datax">g</td>
        <td class="datax">h</td>
    </tr>
</table>

How to get the second and the fourth value of each <tr> ? If i do:

bs.findAll('td', {'class':'datax'})

I get:

        <td class="datax">a</td>
        <td class="datax">b</td>
        <td class="datax">c</td>
        <td class="datax">d</td>

        <td class="datax">e</td>
        <td class="datax">f</td>
        <td class="datax">g</td>
        <td class="datax">h</td>

it's correct! but I would like to have this result:

        <td class="datax">b</td>
        <td class="datax">d</td>

        <td class="datax">f</td>
        <td class="datax">h</td>

so, the values I want are -> b - d - f - h

(the second and the forth <td> of each <tr>)

Is it possible with BeautifulSoup module?

Thank you very much!

A: 

I know using HTQL, it is simple:

<tr>.<td>2,4

--

HTQL only has COM support thought. Here is a complete example in javascript:

<html>
<body>
<script language=JavaScript>
     var a= new ActiveXObject("HtqlCom.HtqlControl");
     a.setUrl("C:\\test_table.html");
     a.setQuery("<tr>.<td>2,4");
     for (a.moveFirst(); !a.isEOF(); a.moveNext()){
         document.write(a.getValueByIndex(1));
     }
</script>
</body>
</html>

seagulf
what? Could you give me a complete example?thank you very much!
Damiano
-1: HTQL is not quite Python-friendly...
RaphaelSP
+4  A: 

This should do it~

final_values=[td.string for td in bs.findAll('td', {'class':'datax'})[1::2]]

(after comment clarification) for your specific case it would be:

final_values=[td.b.a.string for td in bs.findAll('td', {'class':'datax'})[1::2]]
dagoof
I think there is an error. I get [None, None, None ..... ]
Damiano
>>> [td.string for td in soup.findAll('td', {'class':'datax'})[1::2]][u'b', u'd', u'f', u'h']unless bs isn't BeautifulSoup(source) this should work fine
dagoof
hmmm i checked... td.string is None beacuse the value of td is -> <td class="datax"><b><a href="/q?s=AAU">AAU</a></b></td> (i have to get AAU)
Damiano
[td.b.a.string for td in bs.findAll('td', {'class':'datax'})[1::2]]
dagoof
ok, works! but the first has an <a href> the second no. So td.b.a.contents is wrong (works for the first but not for the second) how to do? thank you really much!
Damiano
hmm I tried...but 1::2 doesn't works! i also get others colums values
Damiano
@Damiano, you need to understand what 1::2 (http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-buffer-xrange) means and look into the BeautifulSoup doc. After that, you can adapt the code whatever the way you like.
Dingle