views:

619

answers:

2

What is the best way to convert a table within an HTML document to an excel-readable file? I would like this to be a command-line tool that I can call in bash on my mac, as I'd like to batch process a bunch of HTML files.

I know I could write a script to do this fairly easily, but am looking for generic, existing tools that can be called from the command-line. I would prefer that formatting be preserved as much as possible, but would be willing to fall back to CSV if nothing else that's easy to install and set up fits the bill.

A: 

html2text should work, at least, it should be able to generate something you can pick up as a comma separated list (or hack into one fairly easily). There are lots of links to it here:

http://www.google.com/search?hl=en&q=html2text&btnG=Search

It has lots of flags to control how the output is formated. Try it.

--jeff

+2  A: 

Excel can read/open HTML documents with tables, without the need for conversion. It will automatically map the table cells to worksheet cells.

Try this:

  • Save the data below in a file with an .html extension.
  • Open the file with Excel
<table>
<tr>
<th>Heading1</th>
<th>Heading2</th>
</tr>
<tr>
<td>R1, C1</td>
<td>R1, C2</td>
</tr>
<tr>
<td>R2, C1</td>
<td>R2, C2</td>
</tr>
</table>
Robert Mearns
Thanks - that works better than expected!
Julie