views:

492

answers:

5

I've already seen lots of posts on the site for RTF to HTML and some other posts talking about some HTML to RTF converters, but I'm really trying to get a full breakdown of what is considered the most widely used commercial product, open source product or if people recommend going home grown. Apologies if you consider this a duplicate question, but I'm trying to create a product matrix to see what is the most viable for our application. I also think this would be helpful for others.

The converter would be used in an ASP.NET 2.0 application (we're upgrading to 3.5 shortly but still sticking with WebForms) using SQLServer 2005 (soon 2008) as the DB.

From reading a few posts, SautinSoft appears to be popular as a commercial component. Are there other commercial components that you'd recommend for converting HTML to RTF? Price does matter, but even if it's a little on the expensive side, please list it.

For open source, I read that OpenOffice.org can be run as a service so that it can convert files. However, this appears to be only Java based. I imagine, I'd need some kind of interop to use this? What .NET open source components, if any, are out there for converting HTML to RTF?

For home grown, is an XSLT the way to go with XHTML? If so, what component do you recommend for generating XHTML? Otherwise, what other home grown avenuses do you recommend.

Also, please note that I currently don't care so much about RTF to HTML. If a commercial component offers this and the price is still the same, fine, otherwise please don't mention it.

A: 

I just came across this WYSIWYG rich text editor (RTE) for the web that also has an HTML to RTF converter, Cute Editor for .NET. Does anyone have any experience with this component? My main experience for web based RTEs have been CKEditor (fckEditor) and TinyMCE but as far as I can tell CKEditor and TinyMCE do not have HTML to RTF converters built in.

nickyt
A: 

Hi nickyt,

Could I get more background on the technical task at hand? Basically, why are you doing this? What program is going to view the RTF end-product?

Albert
@Albert. Data is pulled from a DB to generate an RTF report. All the RTF formatting is currently done in the report (hard-coded... ourch!) based on a spec, but in a few instances, the client wants to format some sections, so we'll give them a rich text editor in the web app and when they save it, I'll convert it to a chunk of formatted RTF that will be pulled from the DB and inserted into the report.
nickyt
@nickyt: Um... I'm totally confused. I'm trying to understand the data flow and conversion here. So far I have the following:DB -> RTF -> RTF* -> DBBut that doesn't make sense as it would seem to imply you have an RTF parser that can grep and dump to the DB. Unless you mean the DB holds RTF data?
Albert
@Albert - Non RTF data is currently stored in the database. When we generate a report, we format the data in the database via C# class that formats the data as RTF. The fields that I'll be adding to the report are going to be stored in the database formatted as RTF. To store them as RTF, I need to convert the HTML in the RTE that's being posted back into RTF. Clear? :)
nickyt
@nickyt: Oh, okay. Here's a solution which you'll hate as it involves time/money, but I think it would be better. Dump RTF for DOCX. There are many tools for DOCX. Mircosoft OOXML SDK v2.0 (http://msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx). Aspose.Words for .NET (aspose.com). These tools would simplify your life as you would completely avoid HTML. In fact, there are a few companies offering web-based DOCX editing. Hopefully, dumping changes back into the DB would be simple (well, okay more simple). Again, you'll probably hate this approach.
Albert
@Albert - I hear ya, but the client currently demands RTF. Maybe in the future this could be a way to go.
nickyt
A: 

I would recommend doing it yourself as the task is not really that complex. Firstly, the easiest way convert one Xml format into another Xml format is with an Xslt. Converting Xml documents in C# is super easy.

Here is a good msdn blog post to get you started. Mike even mentions that it was easier to do this by hand that to deal with a third party.

link

Actually, I already answered this question here. Guess that makes this a duplicate.

Ty
@Ty - I have no problems going custom, just wondering what you'd recommend for converting to XHTML if the HTML isn't perfect.
nickyt
@nickyt Messed up HTML would make this job a real pain. I've done some apps where the HTML/RTF was controlled, but if you are going to see bold tags, strong tags and sometimes tags that are not closed you might need to look at a two staged approach where you normalize the data first and then convert. I don't think you need to worry about XHTML.
Ty
@Ty - I'm going homegrown.
nickyt
A: 

For what its worth and in no particular order.

A while ago i wanted to export to RTF and then import from RTF the RTF in question being manipulated by MS Word.

The first problem is RTF is not an open standard. It is an internal MS standard and there fore they alter it as and when they like and do not generally worry about compatibility. Currently the versions of RTF are 1.3 to 1.9 and they are all different. Internally they use twips for measurement just for good measure.

I bought the O'Reilly pocket book on the subject which helped and read a lot of the MS documentation which is good, but there is a lot of it and lots for each version.

Because of the way RTF is coded using regex to manipulate is incredibly hard work and needs careful handling and concentration to test and get to work. I use a Mac editor that had built in regex so i could steadily test each section and build it into the code.

Because of the number of versions there is also a lot of incompatibility between versions but there is a lot of commonality and in the end it was reasonably hard/easy to get where i wanted (after about a weeks reading and a weeks coding) and producing a really simple version.

I never found a commercial solution but i had to have a free on because of budget so that cut a lot out but take great care in choosing one to make sure it does what you want and has support.

I don't think where you are coming from HTML/XML/XHTML, i was converting CSV formats, it the RTF.

I am not sure if i would advise to DIY or buy. Probably on balance DIY but your own circumstances will dictate that.

Edit: One thing going from content to RTF is easier than vice versa.

BTW not criticising MS fior the RTF versions, hey it's theirs and proprietary so they can do what they like.

PurplePilot
A: 

To convert from HTML to RTF under .Net, you may also you use commercial component from SautinSoft. It names HTML-to-RTF Pro DLL .Net.

Maximus