views:

327

answers:

2

Hey all,

I have built a number of solutions in the past in which people enter data via a webform, validation checks are applied, regex in some cases and everything gets stored in a database. This data is then used to drive output on other pages.

I have a special case here where a user wants to copy/paste HUGE amounts of text (multiple paragraphs with various headers, links and etc throughout) -- what is the best way to handle this before it goes into a database to provide the best output when it needs to come back out?

So far the best I have come up with is sticking all the output from these fields in PRE tags and using regex to add links where appropriate. I have a database put together with a list of special keywords that need to be bold or have other styles applied to them which works fine. So I can make this work using these approaches but it just seems to me that there is probably a much more graceful way of doing it.

  • Nicholas
+2  A: 

There are a lot of ways you could format the text for output. You could simply use pre tags as you mentioned (if you are worried about wrapping, the CSS white-space property does also support the pre-wrap value, but browser support for this is currently sketchy at best).

There are also a large number of markup languages you could use for more advanced formatting options (some of which are listed here). Stack Overflow itself uses Markdown, which I personally enjoy using very much.

However, as the data is being pasted from another source, a markup language may interfere with the formatting of the text - in which case you could roll your own solution, perhaps using regular expressions and functions like htmlentities and nl2br.

Whatever you decide, I would recommend storing the input in its original form in the database so you can retroactively amend your formatting routines at any time.

Alex Barrett
nl2br() what an excellent function, I was not familiar with this one. Going to try the combination of htmlentities() and nl2br() now.
Nicholas Kreidberg
Just using the nl2br() function in conjunction with some database meta tables (used to control keyword highlighting and other stylistic elements) I am good to go. Thanks for the help Alex!
Nicholas Kreidberg
A: 

If you're expecting a good deal of formatting, you should probably go with a WYSIWYG editor. These editors produce word-like toolbars which product (hopefully) valid (x)html-markup which can be directly stored into a text field in your database. Here are a couple examples:

FCKeditor - Massive amount of options/tools

Tinymce - A nice alternative.

Markdown - What stackoverflow.com uses

Both FCKeditor and Tinymce have been thoroughly tested and have proven to be reliable. I don't have any experience with markdown but it seems solid.

I've always hated 'forum' formatting tags like [code], [link], etc. Stackoverflow and others have shown that providing an open wysisyg editor is safe, reliable, and very easy to use. Just take the output it gives you, run it through some sort of escape funtion to check for any kind of injection, xss, etc and store in a text field.

Mike B