tags:

views:

951

answers:

2

Hi Team,

I've got a script that takes a user uploaded RTF document and merges in some person data into the letter (name, address, etc), and does this for multiple people. I merge the letter contents, then combine that with the next merge letter contents, for all people records.

Affectively I'm combining a single RTF document into itself for as many people records to which I need to merge the letter. However, I need to first remove the closing RTF markup and opening of the RTF markup of each merge or else the RTF won't render correctly. This sounds like a job for regular expressions.

Essentially I need a regex that will remove the entire string:

}\n\page ANYTHING \par

Example, this regex would match this:

crap
}
\page{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
{\*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs20 September 30, 2008\par
more crap

So I could make it just:

crap
\page
more crap

Is RegEx the best approach here?

UPDATE: Why do I have to use RTF?

I want to enable the user to upload a form letter that the system will then use to create the merged letters. Since RTF is plain text, I can do this pretty easily in code. I know, RTF is a disaster of a spec, but I don't know any other good alternative.

+1  A: 

I would question the use of RTF in this case. It's not entirely clear to me what you're trying to do overall, so I can't necessarily suggest anything better, but if you can try to explain your project more broadly, maybe I can help.

If this is really the way you want to go though, this regex gave me the correct output given your input:

$output = preg_replace("/}\s?\n\\\\page.*?\\\\par\s?\n/ms", "\\page\n", $input);
Randy
I think I'll repost this as a better question. Thanks for your help.
Justin
A: 

To this I can say ick ick ick. Nevertheless, rcar's cludge probably will work, barring some weird edge-case where RTF doesn't actually end in that form, or the document-wide styles include important information that utterly messes up the formatting, or any other of the many failure modes.

Edward Z. Yang