views:

576

answers:

8

I have an HTML page that I generate from the data contained in a database. The database sometimes contains long strings that the browser can't break because the strings don't contain breakable characters (space, point, comma, etc...).

Is there any way to fix this using html, css or even javascript?

See this link for an example of the problem.

+2  A: 

As it has been pointed out numerous times, no, there is nothing you can do about it, without preprocessing the strings programmatically before displaying them.

I know there is a strategy with inserting the soft hyphen character (­), where needed, but does not seem like a popular option.

Check out this question: Soft hyphen in HTML ( vs. ­)

Developer Art
A: 

You can use jQuery to achieve that, but How : Let me explain a little bit. First you need to add the reference and there is a plug-in which may help you : Read More Plugin - JQuery But you need to penetrate your code during the fetch phase. At this point you can handle this problem in HttpHandler or Page_PreInit phase but w/o any server side code it must be hard or perhaps there isn't any way. I don't know but you should be able to add something in your database-fetched html page.

Braveyard
+1  A: 

There is special character ­ or ­ that could do it. For example:

Dzie­le­nie wy­ra­zów

could be display like:

 1. dzie
 2. le
 3. nie wy
 5. ra
 6. zow
Rin
+5  A: 

Yes you can, just set the css property of the box to:

.some_selector {
    word-wrap: break-word;
}

Edit: Some testing shows that it does work with a div or a p - a block level element - but it does not work with a table cell, nor when the div is put inside a table cell.

Tested and works in IE6, IE7, IE8, Firefox 3.5.3 and Chrome.

Works:

<div style="word-wrap: break-word">aaaaaaaaaaaaaaaaaaaaaaddddddddddddddddddddddddddddddddddddddddddaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa </div>
jeroen
Unless I'm doing something wrong, adding word-wrap has no effect (firefox/IE). My HMTL editor (Visual Studio) tells me this is not a valid CSS property name.
Sly
I´ve just tried it here on SO adding a bunch of x's in firebug and it works in Firefox 3.5. It should work in IE as well; I think it used to be a IE only property.
jeroen
It works in IE 7. But not in Firefox 3.0 or Safari 4.0.
Sly
I have just upgraded to Firefox 3.5.3. The DIV in TD does not work in 3.5.
Sly
This is very strange, when I test in Windows XP the results are as shown in my answer, when I check in Windows 7 RC, the table solution does not work in Firefox nor Chrome. I guess you'll have to avoid using tables...
jeroen
I'm going to use a javascript approach as described in my own answer to this question. It's safer than relying on CSS3 browser support.
Sly
I'm deleting my other comment because it breaks SO layout making this comment thread hard to read.
Sly
That's a good idea :-)
jeroen
By the way, I don't know if the word-wrap solution works in IE6, but if it does, I would avoid javascript and tables.
jeroen
It actually does work in IE6!!!
jeroen
"word-wrap: break-word" isn't supported on older non-MSIE browsers, but you can pair it with "overflow: hidden" to truncate the long word at the end of available space for browsers that don't recognize the "word-wrap" setting and provide a (relatively) complete CSS-only solution.
Dave Sherohman
+1  A: 

I'm answering my own question here...

Based on your answers, I came up with this solution (thanks to @CMS in this question for his help).

This script breaks any word that is more than 30 characters long by inserting a space at the 31st position.

Here is the fixed version: link

I have one problem left, I'd rather insert a &shy; then a space. But the assigning node.nodeValue or node.textContent causes the insertion of the text &shy; not the tag.

<script type="text/javascript">

    $(function() {
        replaceText(/\w{30}/g, "$& ", document.body);
    });

    function replaceText(oldText, newText, node) {
        node = node || document.body; // base node 

        var childs = node.childNodes, i = 0;

        while (node = childs[i]) {
            if (node.nodeType == 3) { // text node found, do the replacement
                if (node.textContent) {
                    node.textContent = node.textContent.replace(oldText, newText);
                } else { // support to IE
                    node.nodeValue = node.nodeValue.replace(oldText, newText);
                }
            } else { // not a text mode, look forward
                replaceText(oldText, newText, node);
            }
            i++;
        }
    }

</script>

I'll wait a few days before I accept this answer in case someone comes up with a simpler solution.

Thanks

Sly
gnarf
Also - Based on the way this works - I wrote a little jQuery plugin.
gnarf
Cool, the plug-in was my next step !! Thanks also for fixing the shy. Can you explain the `\xAD` syntax? Thanks!
Sly
My answer's updated to include the info you requested (I hope) - although we should move the comment thread to that answer ;)
gnarf
A: 

It's easier to break up the long words from a text string, before you add them to the document.

It would also be nice to avoid orphans, where you have only one or two characters on the last line.

This method will insert spaces in every unspaced run of characters longer than n, splitting it so that there are at least min characters on the last line.

function breakwords(text, n, min){
 var L= text.length;
 n= n || 20;
 min= min || 2;
 while(L%n && L%n<min)--n;
 var Rx= RegExp('(\\w{'+n+',}?)','g');
 text= text.replace(Rx,'$1 ');
 return text;
}

//test

var n=30, min=5;

var txt= 'abcdefghijklmnopqrstuvwxyz0123456789 abcdefghijklmnopqrstuvwxyz012345678 abcdefghijklmnopqrstuvwxyz01234567 abcdefghijklmnopqrstuvwxyz0123456';

txt=txt.replace(/(\w{30,})/g,function(w){return breakwords(w,n,min)});

alert(txt.replace(/ +/g,'\n'))

/*  returned value: (String)
abcdefghijklmnopqrstuvwxyz0123
456789
abcdefghijklmnopqrstuvwxyz0123
45678
abcdefghijklmnopqrstuvwxyz012
34567
abcdefghijklmnopqrstuvwxyz01
23456
*/
kennebec
+2  A: 

Based on this article and this one as well: the "Shy Hyphen" or "Soft Hyphen" can be written in HTML as: &shy; / &#173; / &#xAD (173 dec = AD hex). They all convert to the U+00AD character.

The JavaScript textContent and nodeValue of the DOM Text Nodes are not 'entity encoded' - they just contain the actual entities. In order to write these characters you must therefore encode them yourself: \xAD is a simple way to write the same character in a JavaScript string. String.fromCharCode(173) would also work.

Based on your own VERY good answer - a jQuery Plugin version:

$.fn.replaceInText = function(oldText, newText) {
  // contents() gets all child dom nodes -- each lets us operate on them
  this.contents().each(function() {
    if (this.nodeType == 3) { // text node found, do the replacement
        if (this.textContent) {
            this.textContent = this.textContent.replace(oldText, newText);
        } else { // support to IE
            this.nodeValue = this.nodeValue.replace(oldText, newText);
        }
    } else {
      // other types of nodes - scan them for same replace
      $(this).replaceInText(oldText, newText);
    }
  });
  return this;
};

$(function() {
    $('div').replaceInText(/\w{10}/g, "$&\xAD");
});

A side note:

I think that the place this should happen is NOT in JavaScript - it should be in the server side code. If this is only a page used to display data- you could easily do a similar regexp replace on the text before it is sent to the browser. However the JavaScript solution offers one advantage(or disadvantage depending on how you want to look at it) - It doesn't add any extraneous characters to the data until the script executes, which means any robots crawling your HTML output for data wont see the shy hyphens. Although the HTML spec interprets it as a "hyphenation hint" and an invisible character its not guaranteed across the rest of the Unicode world: (quote from Unicode standard via the second article I linked)

U+00AD soft hyphen indicates a hyphenation point, where a line-break is preferred when a word is to be hyphenated. Depending on the script, the visible rendering of this character when a line break occurs may differ (for example, in some scripts it is rendered as a hyphen -, while in others it may be invisible).

Another Note: Found in this other SO Question - it seems that the "Zero Width Space" character &#8203; / &#x200b; / U+200b is another option you might want to explore. It would be \x20\x0b as a javascript string.

gnarf
@gnarf: Thanks a lot!! The data will come from in json format, from an ajax call. My service is UI agnostic; it simply returns data. That's why I want to address this on the client side. And after all this is just a patch until browsers offer better support for the CSS3 word-warp property.
Sly
A: 
$tableQuery = .....
while($query = mysql_fetch_array($tableQuery)) {
    $rows = $tablerow; // set variable
    $varlength = strlen($row1); // count number of characters
    $limit = 200; // set character limit
    if ($varlength > $limit) { // if character number if more than character limit
    $row1 = substr($rows,0,$limit); // display string up to character limit,
    $row2 = substr($rows,200,$limit); //display string from the 200`th character to the 400`th character
    $row3 = substr($rows,400,$limit); // display string from the 400`th character to the 600`th character
......
and so on
.....
    $rowN = substr($rows, x (start string from character X), $limit);

echo $row1."<br />(or " ", or what you want)".$row2."<br />(or " ", or what you want)"........$rowN;
    }

string substr ( string $string , int $start [, int $length ] )
Returns the portion of string specified by the start and length parameters.

this works very good for me.. now it depends if you want to use it or not :) it`s my solution to your problem :)

DanTdr