views:

5540

answers:

10

Word wrap is one of must-have features in modern text editor.

Do you know how to handle word wrap? What is the best algorithm for word-wrap?

updated: If text is several million lines, how can I make word-wrap very fast?

updated: Why I need the solution? Because my projects must draw text with various zoom level and simultaneously beautiful appearance.

updated: Running environment is Windows Mobile devices. Maximum 600MHz speed with very small memory size.

updated: How should I handle line information? Let's assume original data has three lines.

THIS IS LINE 1.
THIS IS LINE 2.
THIS IS LINE 3.

After word break text will be shown like this:

THIS IS
LINE 1.
THIS IS
LINE 2.
THIS IS
LINE 3.

Should I allocate 3 lines more? Or any other suggestions?

A: 

with or without hyphenation?

without its easy. Just encapsule your text as wordobjects per word and give them a method getWidth() then start at the first word adding up the rowlength until it is greater than the available space. if so wrap the last word and start counting again for the next row starting with this one ecetera.

With hyphenation you need hyphenation rules in a common format like: hy-phen-a-tion

Then its the same as above except you need to split the last word which has caused the overflow.

An good example and tutorial of how to structure your code for an excellent texteditor is given in the Gang of Four Design Patterns book. Its one of the main sample on which they show the patterns.

Sven Hecht
Why was this voted -1? Granted the greedy algorithm isn't optimal, but...
ShreevatsaR
beats me. I was suprised too.
Sven Hecht
+1  A: 

I don't know of any specific algorithms, but wouldn't the following be a rough outline of how it should work:

  1. For current text size, font, display size, window size, margins, etc, determine how many characters can fit on a line (if fixed-type), or how many pixels can fit on a line (if not fixed-type).
  2. Go through line character by character, calculating how many characters or pixels have been recorded since the beginning of the line.
  3. When you go over the max chars/pixels for the line, move back to the last space/punctuation mark, move all text to next line.
  4. Repeat until you go through all text in document.

Question: In .net, word wrapping functionality is built in to controls like TextBox. I am sure that similar built in functionality exists for other languages as well. Is there a reason why you don't want to use a pre-built solution? This seems along the lines of reinventing the wheel.

Yaakov Ellis
+3  A: 

Regarding your update and speed question, remember to optimise later. First, write your word wrapping algorithm. Run it on a million lines if text. If and only if it is too slow for your requirements, then optimise.

Greg Hewgill
+4  A: 

you may find this useful

http://www.catch22.net/tuts/editor04.asp

JimmyJ
+6  A: 

Here is a word-wrap algorithm I've written in C#. It should be fairly easy to translate into other languages (except perhaps for IndexOfAny).

static char[] splitChars = new char[] { ' ', '-', '\t' };

private static string WordWrap(string str, int width)
{
    string[] words = Explode(str, splitChars);

    int curLineLength = 0;
    StringBuilder strBuilder = new StringBuilder();
    for(int i = 0; i < words.Length; i += 1)
    {
        string word = words[i];
        // If adding the new word to the current line would be too long,
        // then put it on a new line (and split it up if it's too long).
        if (curLineLength + word.Length > width)
        {
            // Only move down to a new line if we have text on the current line.
            // Avoids situation where wrapped whitespace causes emptylines in text.
            if (curLineLength > 0)
            {
                strBuilder.Append(Environment.NewLine);
                curLineLength = 0;
            }

            // If the current word is too long to fit on a line even on it's own then
            // split the word up.
            while (word.Length > width)
            {
                strBuilder.Append(word.Substring(0, width - 1) + "-");
                word = word.Substring(width - 1);

                strBuilder.Append(Environment.NewLine);
            }

            // Remove leading whitespace from the word so the new line starts flush to the left.
            word = word.TrimStart();
        }
        strBuilder.Append(word);
        curLineLength += word.Length;
    }

    return strBuilder.ToString();
}

private static string[] Explode(string str, char[] splitChars)
{
    List<string> parts = new List<string>();
    int startIndex = 0;
    while (true)
    {
        int index = str.IndexOfAny(splitChars, startIndex);

        if (index == -1)
        {
            parts.Add(str.Substring(startIndex));
            return parts.ToArray();
        }

        string word = str.Substring(startIndex, index - startIndex);
        char nextChar = str.Substring(index, 1)[0];
        // Dashes and the likes should stick to the word occuring before it. Whitespace doesn't have to.
        if (char.IsWhiteSpace(nextChar))
        {
            parts.Add(word);
            parts.Add(nextChar.ToString());
        }
        else
        {
            parts.Add(word + nextChar);
        }

        startIndex = index + 1;
    }
}

It's fairly primitive - it splits on spaces, tabs and dashes. It does make sure that dashes stick to the word before it (so you don't end up with stack\n-overflow) though it doesn't favour moving small hyphenated words to a newline rather than splitting them. It does split up words if they are too long for a line.

It's also fairly culturally specific, as I don't know much about the word-wrapping rules of other cultures.

ICR
Very nice and concise. Minor bug: if the string contains a line break, curLineLength should be set to zero (easiest is to add '\n' to breaking chars, and then test if word equals '\n').
dbkk
Also, better not to try to put a hyphen when splitting long words, just break them. Proper end-of-line hyphens are a difficult problem, even for Eng-lish (not Engli-sh or Engl-ish).
dbkk
A: 

@[ICR]:

Thanks for good solution but I have more problems.

I have 2 millions lines text - you can find such text data from Gutenberg project site. When user presses left or right direction key on mobile phone, text will be zoomed in or out immediately.

Should I keep line information as result of word break? Is this best method?

popopome
+8  A: 

Donald E. Knuth did a lot of work on the line breaking algorithm in his TeX typesetting system. This is arguably one of the best algorithms for line breaking - "best" in terms of visual appearance of result.

His algorithm avoids the problems of greedy line filling where you can end up with a very dense line followed by a very loose line.

An efficient algorithm can be implemented using dynamic programming.

Bjarke Ebert
+7  A: 

I don't know if anyone will ever read this seeing how old this question is, but I had occasion to write a word wrap function recently, and I want to share what I came up with. I used a TDD approach almost as strict as the one from the Go example. I started with the test that wrapping the string "Hello, world!" at 80 width should return "Hello, World!" Clearly, the simplest thing that works is to return the input string untouched. Starting from that, I made more and more complex tests and ended up with a recursive solution that (at least for my purposes) quite efficiently handles the task.

Pseudocode for the recursive solution:

Function WordWrap (inputString, width)
    Trim the input string of leading and trailing spaces.

    If the trimmed string's length is <= the width,
        Return the trimmed string.
    Else,
        Find the index of the last space in the trimmed string, starting at width

        If there are no spaces, use the width as the index.

        Split the trimmed string into two pieces at the index.

        Trim trailing spaces from the portion before the index,
        and leading spaces from the portion after the index.

        Concatenate and return:
          the trimmed portion before the index,
          a line break,
          and the result of calling WordWrap on the trimmed portion after
            the index (with the same width as the original call).

This only wraps at spaces, and if you want to wrap a string that already contains line breaks, you need to split it at the line breaks, send each piece to this function and then reassemble the string. Even so, in VB.NET running on a fast machine, this can handle about 20 mb/sec.

Daniel Straight
+2  A: 

I wondered the same thing for my own editor project. My solution was a two step process:

  1. Find the line ends and store them in an array.
  2. For very long lines, find suitable break points at roughly 1K intervals and save them in the line array, too. This is to catch the "4MB text without a single line break".

When you need to display the text, find the lines in question and wrap them on the fly. Remember this information in a cache for quick redraw. When the user scrolls a whole page, flush the cache and repeat.

If you can, do loading/analyzing of the whole text in a background thread. This way, you can already display the first page of text while the rest of the document is still being examined. The most simple solution here is to cut the first 16KB of text away and run the algorithm on the substring. This is very fast and allows you to render the first page instantly, even if your editor is still loading the text.

You can use a similar approach when the cursor is initially at the end of the text; just read the last 16KB of text and analyze that. In this case, use two edit buffers and load all but the last 16KB into the first while the user is locked into the second buffer. And you'll probably want to remember how many lines the text has when you close the editor, so the scroll bar doesn't look weird.

It gets hairy when the user can start the editor with the cursor somewhere in the middle but ultimately, it's only an extension of the end-problem. Only you need to remember the byte position, the current line number and the total number of lines from the last session plus you need three edit buffers or you need an edit buffer where you can cut away 16KB in the middle.

Alternatively, lock the scrollbar and other interface elements while the text is loading; that allows the user to look at the text while it loads completely.

Aaron Digulla