ansaurus

Question

Javascript: Whitespace Characters being Removed in Chrome (but not Firefox)

Answer 1

A:

I'd like to help you more, but it's hard to guess without being able to test it, but I suppose you can get around it by adding space-like characters around your links, eg.  .

By the way, this feature of yours that adds helpful links on copying is really interesting.

mqchen 2010-06-02 00:48:24

Thanks chen. You should be able to test it at the link I provided. Please let me know what problems you're running into.

Matrym 2010-06-02 01:03:10

Answer 2

+2 A:

It's not anything to do with the linking functionality; it happens to copied links that are already on the page too, and the credit content, even if the processSel() call is commented out.

It seems to be a weird bug in Chrome's rich text copy function. The content in the holder is fine; if you cloneContents the selected range and alert its innerHTML at the end, the whitespaces are clearly there. But whitespaces just before, just after, and at the inner edges of any inline element (not just links!) don't show up in rich text.

Even if you add new text nodes to the DOM containing spaces next to a link, Chrome swallows them. I was able to make it look right by inserting non-breaking spaces:

var links= lbp.vrs.holder.getElementsByTagName('a');
for (var i= links.length; i-->0;) {
    links[i].parentNode.insertBefore(document.createTextNode('\xA0 '), links[i]);
    links[i].parentNode.insertBefore(document.createTextNode(' \xA0), links[i].nextSibling);
}

but that's pretty ugly, should be unnecessary, and doesn't fix up other inline elements. Bad Chrome!

var keyword = links[i].innerHTML.toLowerCase();

It's unwise to rely on innerHTML to get text from an element, as the browser may escape or not-escape characters in it. Most notably &, but there's no guarantee over what characters the browser's innerHTML property will output.

As you seem to be using jQuery already, grab the content with text() instead.

var isDomain = new RegExp(document.domain, 'g');
if (isDomain.test(linkUrl)) { ...

That'll fail every second time, because global regexps remember their previous state (lastIndex): when used with methods like test, you're supposed to keep calling repeatedly until they return no match.

You don't seem to need g (multiple matches) here... but then you don't seem to need regexp here either as a simple String indexOf would be more reliable. (In a regexp, each . in the domain would match any character in the link.)

Better still, use the URL decomposition properties on Location to do a direct comparison of hostnames, rather than crude string-matching over the whole URL:

if (location.hostname===links[i].hostname) { ...

// don't match an alphanumeric char
var dontMatch =/\w/;
if(child.nodeValue.charAt(index - 1).match(dontMatch) || child.nodeValue.charAt(index+keyword.length).match(dontMatch))
    break;

If you want to match words on word boundaries, and case insensitively, I think you'd be better off using a regex rather than plain substring matching. That'd also save doing four calls to findText for each keyword as it is at the moment. You can grab the inner bit (in if (child.nodeType==3) { ...) of the function in this answer and use that instead of the current string matching.

The annoying thing about making regexps from string is adding a load of backslashes to the punctuation, so you'll want a function for that:

// Backslash-escape string for literal use in a RegExp
//
function RegExp_escape(s) {
    return s.replace(/([/\\^$*+?.()|[\]{}])/g, '\\$1')
};

var keywordre= new RegExp('\\b'+RegExp_escape(keyword)+'\\b', 'gi');

You could even do all the keyword replacements in one go for efficiency:

var keywords= [];
var hrefs= [];
for (var i=0; i<links.length; i++) {
    ...
    var text= $(links[i]).text();
    keywords.push('(\\b'+RegExp_escape(text)+'\\b)');
    hrefs.push[text]= links[i].href;
}
var keywordre= new RegExp(keywords.join('|'), 'gi');

and then for each match in linkup, check which match group has non-zero length and link with the hrefs[ of the same number.

bobince 2010-06-02 09:58:31

Bobince, you're my hero :). Did you notice the doxdesk kudos? You'll be showered with appreciation on my project page!

Matrym 2010-06-02 21:48:17

Heh! Just noticed I forgot to link the other answer containing the regex-based `findText`... fixed.

bobince 2010-06-03 08:24:29

Matrym 2010-06-03 20:44:35

I meant library independent. Typo

Matrym 2010-06-03 20:58:57

"and then for each match in linkup, check which match group has non-zero length and link with the hrefs[ of the same number." <-- Sorry, but I'm unable to follow you. Could you show me? http://jsbin.com/oroxo3/edit

Matrym 2010-06-03 21:15:12

The canonical way to get text (and what `text()` uses) is to do a depth-first traversal of the DOM tree from the element's childNodes collecting text (ie. recurse on `child.nodeType===1` and add to string on `child.nodeType===3`). There is also the DOM Level 3 Core property `element.textContent`, but it isn't supported in IE or some other older browsers. On IE you can branch and use `element.innerText` instead, but this isn't quite exactly the same (it is especially sloppy about whitespaces).

bobince 2010-06-03 21:24:23

Well, if you have a regexp like `(term1)|(term2)|(term3)`, you can use a replacement function that takes a `match` object and looks at `match[1]`. If it's undefined we know that `term1` was not the expression that caused the match; then look at `match[2]`, and so on until you find which term it was that matched.

bobince 2010-06-05 15:09:07

ansaurus

tags:

views:

answers:

Javascript: Whitespace Characters being Removed in Chrome (but not Firefox)

related questions