views:

731

answers:

3

hello, I have installed fckeditor and when pasting from MS Word it adds alot of unnecessary formatting. I want to keep certain things like bold, italics, bulltes and so forth. I have searched the web and came up with solutions that strips everything away even the stuff that i wanted to keep like bold and italics. Is there a way to strip just the unnecessary word formatting?

A: 

But fckeditor is, as the name and website suggests, a text editor. To me, that means it just shows you the characters in the file.

You can't have bold and italic formatting without some extra characters.

EDIT: Ah, I see. Looking more closely at the Fckeditor website, it's an HTML editor, not one of the simple text editors I'm used to.

There's Paste from Word cleanup with autodetection listed as a feature.

pavium
pavium, fckeditor is a RICH TEXT editor, abstracts all the nastiness of using editable DIVs and adds pretty toolbars. Under the hood, it's stored in HTML, which means when someone pastes in from Word, Word passes it all sorts of HTML Evilness.
richardtallent
+2  A: 

I understand the problem very well. When copying out of MS-Word (or any word processing or rich text editing aware text area) then pasting into FCKEditor (same problem happens with TinyMCE), the original markup is included in what is in the clipboard and gets processed. This markup is not always complimentary with the markup that it gets embedded in with the target of the paste operation.

I don't know the solution other than become a contributor to FCKEditor and study the code and make the modification. What I normally do is instruct users to perform a two phase clipboard operation.

  • Copy from MS-Word
  • Paste into notepad
  • Select all
  • Copy from notepad
  • Paste into FCKEDitor
Glenn
+4  A: 

Here's a solution I use to scrub incoming HTML from rich text editors... it's written in VB.NET and I don't have time to convert to C#, but it's pretty straightforward:

 Public Shared Function CleanHtml(ByVal html As String) As String
     '' Cleans all manner of evils from the rich text editors in IE, Firefox, Word, and Excel
     '' Only returns acceptable HTML, and converts line breaks to <br />
     '' Acceptable HTML includes HTML-encoded entities.
     html = html.Replace("&" & "nbsp;", " ").Trim() ' concat here due to SO formatting
     '' Does this have HTML tags?
     If html.IndexOf("<") >= 0 Then
         '' Make all tags lowercase
         html = RegEx.Replace(html, "<[^>]+>", AddressOf LowerTag)
         '' Filter out anything except allowed tags
         '' Problem: this strips attributes, including href from a
         '' http://stackoverflow.com/questions/307013/how-do-i-filter-all-html-tags-except-a-certain-whitelist
         Dim AcceptableTags      As String   = "i|b|u|sup|sub|ol|ul|li|br|h2|h3|h4|h5|span|div|p|a|img|blockquote"
         Dim WhiteListPattern    As String   = "</?(?(?=" & AcceptableTags & ")notag|[a-zA-Z0-9]+)(?:\s[a-zA-Z0-9\-]+=?(?:([""']?).*?\1?)?)*\s*/?>"
         html = Regex.Replace(html, WhiteListPattern, "", RegExOptions.Compiled)
         '' Make all BR/br tags look the same, and trim them of whitespace before/after
         html = RegEx.Replace(html, "\s*<br[^>]*>\s*", "<br />", RegExOptions.Compiled)
     End If
     '' No CRs
     html = html.Replace(controlChars.CR, "")
     '' Convert remaining LFs to line breaks
     html = html.Replace(controlChars.LF, "<br />")
     '' Trim BRs at the end of any string, and spaces on either side
     Return RegEx.Replace(html, "(<br />)+$", "", RegExOptions.Compiled).Trim()
 End Function

 Public Shared Function LowerTag(m As Match) As String
   Return m.ToString().ToLower()
 End Function

In your case, you'll want to modify the list of "approved" HTML tags in "AcceptableTags"--the code will still strip all the useless attributes (and, unfortunately, the useful ones like HREF and SRC, hopefully those aren't important to you).

Of course, this requires a trip to the server. If you don't want that, you'll need to add some sort of "clean up" button to the toolbar that calls JavaScript to mess with the editor's current text. Unfortunately, "pasting" is not an event that can be trapped to clean up the markup automatically, and cleaning after every OnChange would make for an unusable editor (since changing the markup changes the text cursor position).

richardtallent
Whoa..this is awesome. But I do need links and basic html tags