If you were to use jQuery's .html() method on untrusted input, your web application would be vulnerable to a cross-site scripting (XSS) attack, which would be exploitable by posting a malicious tweet. The best way to avoid this security problem is to append each part of the tweet individually, using the correct jQuery functions that use the web browser's DOM functions to HTML-escape strings.
First, combine the two regexes into one using regex alternation (|
symbol). For the purposes of my example code, the Twitter username regex is /@\w+/gi
and the URL regex is /(?:https?|ftp):\/\/.*?\..*?(?=\W?\s)/gi
These regexes are not the same as those in the original question; the original URL regex did not seem to work correctly, and we need not use capturing groups. The combined regex is therefore /@\w+|(?:https?|ftp):\/\/.*?\..*?(?=\W?\s)/gi
.
For each time the regex matches, securely add the text that comes before the match to the container. To do this in jQuery, create an empty "span" element and use the .text() method to insert text inside. Using $('text here') would leave an XSS hole wide open. What if the contents of a tweet are <script>alert(document.cookie)</script>
?
Check the first character of the match to determine how it is to be formatted. Twitter usernames begin with "@", but URLs cannot.
Format the match and add it to the container. Again, do not pass untrusted input to the $ or jQuery function; use the .attr() method to add attributes such as href and the .text() method to add link text.
After all matches have been processed, add the last plain text part of the tweet, which had not been added in step 3 or 4.
Example code (also at http://jsfiddle.net/6X6xD/3/):
var tweet = 'joined @BundleHunt for a chance to win the 2010 Mega Bundle! http://bundlehunt.com * Only 10 Days Left! URL containing an at sign: http://www.last.fm/event/1196311+Live+@+Public+Assembly. This should not work: <scr'+'ipt>alert(document.cookie)</scr'+'ipt>';
var combinedRegex = /@\w+|(?:https?|ftp):\/\/.*?\..*?(?=\W?\s)/gi,
container = $('#tweet-container');
var result, prevLastIndex = 0;
combinedRegex.lastIndex = 0;
while((result = combinedRegex.exec(tweet))) {
// Append the text coming before the matched entity
container.append($('<span/>').text(tweet.slice(prevLastIndex, result.index)));
if(result[0].slice(0, 1) == "@") {
// Twitter username was matched
container.append($('<a/>')
// .slice(1) cuts off the first character (i.e. "@")
.attr('href', 'http://twitter.com/' + encodeURIComponent(result[0].slice(1)))
.text(result[0])
);
} else {
// URL was matched
container.append($('<a/>')
.attr('href', result[0])
.text(result[0])
);
}
// prevLastIndex will point to the next plain text character to be added
prevLastIndex = combinedRegex.lastIndex;
}
// Append last plain text part of tweet
container.append($('<span/>').text(tweet.slice(prevLastIndex)));
Note: older versions of this answer did recommend using the .html() method. Because this is a serious security problem as mentioned above, I have used the edit button to post my new answer, removing the old one from view.