views:

253

answers:

3

Hi all,

I've got a varchar() field in SQL Server that has some carriage return/linefeeds between paragraph marks.

I'd like to turn it into properly formatted HTML.

For instance:

---------- before ----------

The quick brown fox jumped over the lazy dog. Then he got bored and went to bed. After that, he played with his friends.

The next day, he and his friends had a big party.


---------- after -----------

<p>The quick brown fox jumped over the lazy dog. Then he got bored and went to bed. After that, he played with his friends.</p>

<p>The next day, he and his friends had a big party.</p>


What's the right way to do this? Obviously regular expressions would be a good way to go, but I can't figure out how to trap the beginning of field along with the crlf (carriage return/linefeed) combo in a sane way.

Any regex geniuses out there? Would love some help. Thanks if so!

+6  A: 

A regular expression is not required for something like this. Plain string operations can do it. (Example in C#):

text = "<p>" + text.Replace("\r\n", "</p><p>") + "</p>";

(Depending on if the line breaks are system dependent or not you should use either a specific string like "\r\n" or the property Environment.NewLine.)

If the string initially comes from user input so that you don't have total control over it, you have to properly html encode it before putting the paragraph tags in, to prevent cross site scripting attacks.

Guffa
This was a real forehead-slapper. Of course! Thanks very much. This is easily translatable to a SQL function as well. (And yes, as others have noted, HTML encoding and cross-site scripting prevention is a must.) Thanks for your insight; I needed more coffee to see this one!
+4  A: 

And do not forget that adding <p> tags is not enough, you have to escape characters that have special meaning in HTML ( < becomes &lt; and so on), otherwise you can end up with a broken page or even script injection.

Thilo
A: 

If the text is already broken up into paragraphs with newlines, it could be as simple as

text = Regex.Replace(text, ".+", "<p>$0</p>");

This assumes there are no HTML special characters (as Thilo mentioned) or extra whitespace characters between paragraphs, like this: "text\n \nmore text". You would want to deal with anything like that before you add the tags.

Alan Moore