tags:

views:

1478

answers:

5

Ideally, I'd like a module or library that doesn't require superuser access to install; I have limited privileges in my working environment.

+1  A: 

There is good library pyrtf-ng for all-purpose RTF handling.

cleg
Thanks, but the problem with pyrtf-ng is that it's useful for generating RTF files, not parsing them. I downloaded it from its SourceForge page (there is nothing under the Download tab at Google Code), and this is the only functionality I could find.
Tony
@tony, have you looked at http://code.google.com/p/pyrtf-ng/source/browse/#svn/trunk/rtfng/parser ? When there are no downloads yet on a Google Code hosted project, browse the sources!-)
Alex Martelli
+3  A: 

Have you checked out pyrtf-ng?

Update: The parsing functionality is available if you do a Subversion checkout, but I'm not sure how full-featured it is. (Look in the rtfng.parser.base module.)

Vinay Sajip
+2  A: 

OpenOffice has a RTF reader. You can use python to script OpenOffice, see here for more info.

You could probably try using the magic com-object on Windows to read anything that smells ms-binary. I wouldn't recommend that though.

Actually parsing the raw data probably won't be very hard, see this example written in .bat/QBasic.

DocFrac is a free open source converter betweeen RTF, HTML and text. Windows, Linux, ActiveX and DLL platforms available. It will probably be pretty easy to wrap it up in python.

RTF::TEXT::Converter - Perl extension for converting RTF into text. (in case You have problems withg DocFrac).

A discussion on this subject on python mailing list.

Official Rich Text Format (RTF) Specifications, version 1.7, by Microsoft.

Good luck (with the limited privileges in Your working environment).

Reef
Thanks. I opened the document in OpenOffice and saved it as a plain text file. This was probably the simplest approach. And thanks for reminding me that it's My work environment. I asked for sudo access.
Tony
A: 

Hi! I ran into the same thing ans I was trying to code it myself. It's not that easy but here is what I had when I decided to go for a commandline app. Its ruby but you can adapt to python very easily. There is some header garbage to clean up, but you can see more or less the idea.

f = File.open('r.rtf','r')
 b=0
 p=false
 str = ''
 begin
    while (char = f.readchar)
        if char.chr=='{'
   b+=1 
   next
  end
        if char.chr=='}'
   b-=1 
   next
  end
  if char.chr=='\\'
   p=true
   next
  end
  if p==true && (char.chr==' ' or char.chr=='\n' or char.chr=='\t' or char.chr=='\r')
   p=false 
   next
  end
  if p==true && (char.chr=='\'')
#this is the source of my headaches. you need to read the code page from the header and encode this.
   p=false 
   str << '#'
   next
  end
  next if b>2
  next if p
  str << char.chr
    end
rescue EOFError
end
f.close
Josep Valls
+6  A: 

I've been working on a library called Pyth, which can do this:

http://pypi.python.org/pypi/pyth/

Converting an RTF file to plaintext looks something like this:

from pyth.plugins.rtf15.reader import Rtf15Reader
from pyth.plugins.plaintext.writer import PlaintextWriter

doc = Rtf15Reader.read(open('sample.rtf'))

print PlaintextWriter.write(doc).getvalue()

Pyth can also generate RTF files, read and write XHTML, generate documents from Python markup a la Nevow's stan, and has limited experimental support for latex and pdf output. Its RTF support is pretty robust -- we use it in production to read RTF files generated by various versions of Word, OpenOffice, Mac TextEdit, EIOffice, and others.

Brendon