views:

114

answers:

3

Here's my wild and whacky psuedo-code. Anyone know how to make this real?

Background:

This dynamic content comes from a ckeditor. And a lot of folks paste Microsoft Word content in it. No worries, if I just call the attribute untouched it loads pretty. But the catch is that I want it to be just 125 characters abbreviated. When I add truncation to it, then all of the Microsoft Word scripts start popping up. Then I added simple_format, and sanitize, and truncate, and even made my controller start spotting out specific variables that MS would make and gsub them out. But there's too many of them, and it seems like an awfully messy way to accomplish this. Thus so! Realizing that by itself, its clean. I thought, why not just slice it. However, the microsoft word text becomes blank but still holds its numbered position in the string. So I came up with this (probably awful) solution below.

It's in three steps.

  1. When the text parses, it doesn't display any of the MSWord junk. But that text still holds a number position in a slice statement. So I want to use a regexp to find the first actual character.
  2. Take that character and find out what its numbered position is in the total string.
  3. Use a slice statement to cut it from.

    def about_us_truncated
      x = self.about_us.find.first(regExp representing first actual character)
      x.charCount = y
      self.about_us[y..125]
    end
    

The only other idea i got, is a regex statement that allows it to explicitly slice only actual characters like so :

about_us([a-zA-Z][0..125]) , but that is definately not how it is written.

Here is some sample text of MS Word junk :

 ≪! [If Gte Mso 9]>≪Xml>≪Br /> ≪O:Office Document Settings>≪Br /> ≪O:Allow Png/>≪Br /> ≪/O:Off...
+1  A: 

You haven't provided much information to go off of, but don't be too leery of trying to build this regex on your own before you seek help...

Take your sample text and paste it in Rubular in the test string area and start building your regex. It has a great quick reference at the bottom.

Awgy
+1  A: 

Stumbled across this

http://gist.github.com/139987

it looks like it requires the sanitize gem.

Geoff Lanotte
Awesome find! I'll let you know how it goes. I can't believe i didn't find this yesterday.
Trip
Hmm.. this didn't work. Mostly because they want you to manually enter the elements and MSWord has a million of them.
Trip
A: 

This is technically not a straight answer, but it seems like the best possible one you can find.

In order to prevent MS Word, you should be using CK Editor's built-in MS word sanitizer. This is because writing regex for it can be very complicated and you can very easily break tags in half and destroy your site with it.

What I did as a workaround, is I did a force paste as plain text in the CK Editor.

Trip