ansaurus

Question

Truncate words function in javascript (studying dojo's code)

Answer 1

A:

The code you're looking at is from the dtl library, which is for supporting the django templating language. (http://www.dojotoolkit.org/book/dojo-book-0-9/part-5-dojox/dojox-dtl). I'm sure the code in there is not for just doing a straight string split, but rather parsing the templates they're using.

Also, looking at that regex, they're handling a lot more scenarios than just spaces...for example, the <.*?> will cause any group of words enclosed in opening and closing tags to be considered a "word".

jvenema 2009-06-09 19:56:57

Yeah, I'm also working on a port of django templates for javascript and I figured that dojo's dtl is a good place to get some ideas and perhaps some code.I'm surprised (puzzled?) on why would html/xml tags would be considered as words. Usually when I truncate a string, it's because I want to show a summary with a more.. link, no?

snz3 2009-06-10 12:55:44

I can't speak to how they were using the code in there...for your purposes, sure, that makes sense. But since the regex is including them, I guess its valid. Maybe its just to show the first X words of a template in some sort of template preview? Without spending more time in there, I'm not sure. If you post to the dojo mailing list, I'm sure they could help you out there.

jvenema 2009-06-10 16:46:23

Answer 2

+2 A:

Your split should take into account that any sequence of blank characters is a word separator. You should split on a regexp like \s+.

But other than that, it seems dojo's code takes entities and xml tags as words as well. If you know you don't have such things in your string, your implementation might do the trick. Be careful though that your slice does not go beyond the number of words found, this might need a little check.

subtenante 2009-06-09 19:57:37

Answer 3

A:

function declaration: this is probably a javascript object, and using function_name: function(params) {... helps keep javascript out of the global scope.
By checking the arg variable, they're ensuring that an integer was passed. Using parseInt() will allow both 10 and "10" to be accepted.
This method can handle more delimiters than spaces by the regex being used.
This code is safe for array overflow. You can't count to 10 if there are only 8 words in value. Otherwise, you'd get an array out of bounds or object does not exist error.

Jarrett Meyer 2009-06-09 20:00:30

Of course, they should use parseInt(arg, 10) ...

Greg 2009-06-09 20:05:28

Answer 4

A:

the regex is 3 parts

&.*?; will match character entities (like &)
<.*?> will match thing in angle brackets
(\w[\w-]) will match strings starting with [a-zA-Z0-9] and followed by the same with a dash

it's not just spliting on space. It's looking for things it thinks could be part of a word, and once it finds something that is not, it ups the word count.

It should take a comma or pipe seperated list and work as well as a space seperated list.

Charlie 2009-06-09 20:10:59

Having read your comment and the comments above, I tried using dojo's regexp for a better solution. Problem is that you can't truncate with dojo if the string is written in non-latin characters. (as you said, \w will only match a-zA-Z characters).So my new method would be:...var value_arr = value.match(/(.+?([^\-](?=\s|,)))/g); if(value_arr }return value;

snz3 2009-06-10 13:02:44

ansaurus

tags:

views:

answers:

Truncate words function in javascript (studying dojo's code)

related questions