views:

72

answers:

4

Basically I just need the effect of copying that HTML from browser window and pasting it in a textarea element.

For example I want this:

<p>Some</p>
<div>text<br />Some</div>
<div>text</div>

to become this:

Some
text
Some
text
+3  A: 

This might answer your question: http://stackoverflow.com/questions/822452/strip-html-from-text-javascript

clifgriffin
The proposed solution doesn’t preserve line breaks.
Danylo Mysak
Other proposed solutions on that question do deal with line breaks though :)
clifgriffin
@clifgriffin: do they? I couldn't see one that did.
Tim Down
A: 

Three steps.

First get the html as a string.
Second, replace all <BR /> and <BR> with \r\n.
Third, use the regular expression "<(.|\n)*?>" to replace all markup with "".
Serapth
Unfortunately, this approach ignores line breaks that emerge between two paragraphs or divs.
Danylo Mysak
Is that not as easily solved by inserting a hard break after each close P and DIV tag before doing the regex replace?
Serapth
Well, the problem is a bit deeper. I need to get text which resembles what user sees on a screen. For example, if there are two paragraphs ('p' elements) and they both have standard margin I want to get two line breaks between corresponding text fragments. But when the margin is 0 it needs to be a single line break. That’s how clipboard works — at least in some browsers.
Danylo Mysak
+1  A: 

I tried to find some code I wrote for this a while back that I used. It worked nicely. Let me outline what it did, and hopefully you could duplicate its behavior.

  • Replace images with alt or title text.
  • Replace links with "text[link]"
  • Replace things that generally produce vertical white space. h1-h6, div, p, br, hr, etc. (I know, I know. These could actually be inline elements, but it works out well.)
  • Strip out the rest of the tags and replace with an empty string.

You could even expand this more to format things like ordered and unordered lists. It really just depends on how far you'll want to go.

EDIT

Found the code!

public static string Convert(string template)
{
    template = Regex.Replace(template, "<img .*?alt=[\"']?([^\"']*)[\"']?.*?/?>", "$1"); /* Use image alt text. */
    template = Regex.Replace(template, "<a .*?href=[\"']?([^\"']*)[\"']?.*?>(.*)</a>", "$2 [$1]"); /* Convert links to something useful */
    template = Regex.Replace(template, "<(/p|/div|/h\\d|br)\\w?/?>", "\n"); /* Let's try to keep vertical whitespace intact. */
    template = Regex.Replace(template, "<[A-Za-z/][^<>]*>", ""); /* Remove the rest of the tags. */

    return template;
}
Kevin Wiskia
Erm... that's not Javascript isn't it? Also doesn't directly answer the question, given that question really concerns copy and paste
Yi Jiang
The language really doesn't matter, it's how its going about it. This could easily be ported to JS. I'm just showing something I had done in the past.
Kevin Wiskia
Thank you. That’s quite like it. Although, unfortunately, the result is not exactly what user sees. For example, Convert('<p>Some</p><p>text</p>') and Convert('<p>Some<br /></p><p>text</p>') give different results while browser renders those the same way.
Danylo Mysak
+4  A: 

If that HTML is visible within your web page, you could do it with the user selection (or just a TextRange in IE). This does preserve line breaks, if not necessarily leading and trailing white space:

<div id="container">
    <p>Some</p>
    <div>text<br />Some</div>
    <div>text</div>
</div>

<script type="text/javascript">

function getInnerText(el) {
    var sel, range, innerText = "";
    if (typeof window.getSelection != "undefined" && typeof document.createRange != "undefined") {
        sel = window.getSelection();
        sel.selectAllChildren(el);
        innerText = "" + sel;
        sel.removeAllRanges();
    } else if (typeof document.selection != "undefined" && typeof document.body.createTextRange != "undefined") {
        range = document.body.createTextRange();
        range.moveToElementText(el);
        innerText = range.text;
    }
    return innerText;
}

var el = document.getElementById("container");
alert(getInnerText(el));

</script>
Tim Down
Up vote for a clever solution. Why do all the heavy lifting?
Kevin Wiskia
Thank’s. Interestingly, in non-IE case (first block) it gets what would be copied into clipboard, but in IE case (second block) it’s not the same string.
Danylo Mysak
What's the difference between the IE and non-IE strings? The first block uses Selection's `toString()` method to extract just the text of the selection (rather than the rich text that gets copied to the clipboard), so they should be more or less identical.
Tim Down
Sorry, I meant that the string which you get by copying a fragment of page in clipboard differs from one that your function returns. And this is the case with IE, for non-IE browsers these two strings are identical.The function itself is perfect for the problem I described in my question (except for IE stuff, which is not so important).
Danylo Mysak
Unfortunately, it turned out that my real problem is quite different and probably can’t be solved this way. I need two paragraphs of text, both with margin: 0, to be recognized as two consecutive lines without an empty line between them. It seems like WebKit-browsers are the only browsers that take 'margin' parameter into consideration.
Danylo Mysak
Ah. I don't have an easy answer for that.
Tim Down