views:

161

answers:

3

I have a necessity to sort a given HTML table of the following structure, in Python.

<table>
    <tr>
        <td><a href="#">ABCD</a></td>
        <td>A23BND</td>
        <td><a title="ABCD">345345</td>
    </tr>
    <tr>
        <td><a href="#">EFG</a></td>
        <td>Add4D</td>
        <td><a title="EFG">3432</td>
    </tr>
    <tr>
        <td><a href="#">HG</a></td>
        <td>GJJ778</td>
        <td><a title="HG">2341333</td>
    </tr>

</table>

I am doing something like this:

container = tree.findall("tr")
strOut = ""
data = []
for elem in container:
    key = elem.findtext(colName)
    data.append((key, elem))

data.sort()

The problem is that it sorts by the text inside the <td>. I want to be able to sort by the anchor value and not href.

What can I do to achieve that? Thanks a lot.

A: 

The sort method has the valueable key and cmp arguments which you can use for custom sorting. If you augment the data data structure with the extra information you need for sorting, you can use either key or cmp (depending on exact need) in the call to sort to achieve what you want. Here's a simple example:

In [60]: ids = [1, 2, 3]
In [61]: score = {1: 20, 2: 70, 3: 40}
In [62]: ids.sort(key=lambda x: score[x])
In [63]: ids
Out[63]: [1, 3, 2]

Here, I sorted the ids list according to the score of each id taken from the score dictionary.

Eli Bendersky
@Eli, that's fine, but so's the OP code which uses the older "decorate, sort, undecorate" idiom from before Python's sort gained the `key=` argument. The problem has nothing to do with that: it's all about the proper way of extracting the key in the OP's specific case.
Alex Martelli
@Alex: you're right. To me it also wasn't clear what is the key the OP is interested in, so I settled for telling him of a more idiomatic way to write the code.
Eli Bendersky
+1  A: 

It sorts by the text because that's what you're extracting as the key when you do

key = elem.findtext(colName)

I imagine colName is some tag string, and findtext will just find the text of the first subelement matching that tag. If what you want instead is to use as the key the value of some attribute (e.g. title?) of an <a>,

for ana in elem.findall('a'):
    key = ana.get('title')
    if key is not None: break

Would do that. Exactly what do you want to use as the key?

Alex Martelli
A: 

I know this wasn't your question, but best practice for this sort of thing is to use Javascript. You will get a much better user experience on your website (if that's what you're doing).

This js library is excellent and easy to use: http://www.kryogenix.org/code/browser/sorttable/

bukzor
The reason I want Python to do it because the table is too big for browser to handle it. Hence, the sorting at server side is necessary. I do have a JS implementation I am using currently.
AJ