views:

520

answers:

3

The situation:
Lets say we are implementing a blog engine based on JCR with support for localization.
The content structure looks something like this /blogname/content/[node name]

The problem: What is the best way to name the content nodes (/blogname/content/[nodename]) to satisfy the following requirements:

  1. The node name must be usable in HTML to support REST like URLs i.e.: blogname.com/content/nodename should point to a single content item.
  2. The above requirement must not produce ugly URLs i.e.: /content/node_name is good, /content/node%20name is bad.
  3. Programmatic retrieval should be easy given the node name i.e.: //content[@node_name=some-name]
  4. The naming scheme must guarantee node name uniqueness.

PS: The JCR implementation used is JackRabbit

A: 

Regarding item 3. I recently learned that xpath queries do not allow items to start with a number. If your node name starts with a number it can still be queried by escaping the first byte of the name, but your queries will be more straightforward if you start all node names with a letter.

(I'm not sure about property names. Haven't ever seen one that didn't start with a letter.)

A: 

For 1. to 3. the answer is simple: just use characters you want to see in the node name, ie. escape whatever input string you have (eg. the blog post title) against a restricted character set such as the one for URIs.

For example, do not allow spaces (which are allowed for JCR node names, but would produce the ugly %20 in URLs) and other chars that must be encoded in URLs. You can remove those chars or simply replace them with a underscore, because that looks good in most cases.

Regarding unique names (4.), you can either include the current time incl. milliseconds into it or you explicitly check for collisions. The first might look a bit ugly, but should probably never fail for a blog scenario. The latter can be done by reacting upon the exception thrown if a node with such a name already exists and adding eg. an incrementing counter and try again (eg. my_great_post1, my_great_post2, etc.). You can also lock the parent node so that only one session can actually add a node at the same time, which avoids a trial loop, but comes at the cost of blocking.

Note: //content[@node_name=some-name] is not a valid JCR Xpath query. You probably want to use /jcr:root/content//some-name for that.

Alexander Klimetschek
A: 

Unique names: To quickly generate a unique name from the first characters of a title plus a random number (to resolve conflicts), you could use the following algorithm:

String title = "JCR 170 Data modeling: Node names";
String name = title.substring(0, Math.min(title.length(), 10)).trim().replace(' ', '_');
if (name is not unique) {
    name += "_";
    Random r = new Random();
    while (name is not unqiue) {
        name += Integer.toString(r.nextInt(10));
    }
}

The advantage to use a random number is: even if you have many similar names, this will resolve conflicts very quickly.

Thomas Mueller