views:

207

answers:

2

I have a .rtf file that contains nothing but an integer, say 15. I wish to read this integer in through python and manipulate that integer in some way. However, it seems that python is reading in much of the metadata associated with .rtf files. Why is that? How can I avoid it? For example, trying to read in this file, I get..

{\rtf1\ansi\ansicpg1252\cocoartf949\cocoasubrtf460 {\fonttbl\f0\fswiss\fcharset0 Helvetica;} {\colortbl;\red255\green255\blue255;} \margl720\margr720\margb720\margt720\vieww9000\viewh8400\viewkind0 \pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\ql\qnatural\pardirnatural

+4  A: 

That's the nature of .RTF (i.e Rich Text files), they include extra data to define how the text is layed-out and formated.

It is not recommended to store data in such files lest you encounter the difficulties you noted. Would you go through the effort to parse this file and "recover" your one numeric value, you may expose your application to the risk of updated versions of the RTF format which may render the parsing logic partially incorrect and hence yield wrong numeric data for the application).

Why not store this info in a true text file. This could be a flat text file or preferably an XML, YAML, JSON file for example for added "forward" compatibility as your application and you may add extra parameters and such in the file.

If this file is a given, however, there probably exist Python libraries to read and write to it. Check the Python Package Index (PyPI) for the RTF keyword.

mjv
+3  A: 

That's exactly what the RTF file contains, so Python (in the absence of further instruction) is giving you what the file contains.

You may be looking for a library to read the contents of RTF files, such as pyrtf-ng.

Greg Hewgill