views:

358

answers:

1

I need to bookmark parts of a document from the name of paragraphs but the name of a paragraph is not always a valid name for a bookmark name. I have not found on Google or MSDN an exhaustive list of limitations for bookmark names.

What special characters are forbidden?

The only thing I found is that the length must not exceed 40 characters.

+2  A: 

If you are familiar with regular expressions, I would say it is

^(?!\d)\w{1,40}$

Where \w refers to the range of Unicode word characters, which also contain the underscore and the digits from 0-9.

Expressed differently: The name must start with a word character (but not a digit), then any Unicode word character may follow up to an overall length of 40 characters. Word characters explicitly exclude white space and punctuation of any kind.

As divo states in the comments, bookmarks with names beginning with an underscore are treated as "hidden". It it is not possible to create bookmarks that begin with an underscore via the user interface, but you can do it through "Bookmarks.Add"

Tomalak
+1, Plus that bookmarks whose name begins with '_' are treated as hidden.
0xA3
That would lead us to "^(_|\w)[\w\d]{0,39}$"
Maxime Vernier
@Maxime Vernier: "_" is traditionally part of "\w".
Tomalak
@divo: I actually failed to create a bookmark that begins with "_" in Word 2003...
Tomalak
@Tomalak: I have checked the meaning of \w on http://en.wikipedia.org/wiki/Regular_expressions and it seems \d is part of \w too. I would say "^[_A-Za-z][A-Za-z0-9]{0,39}$"- starts with an underscore or a letter- then at most 39 letters or digits
Maxime Vernier
@Tomalak: I cannot create a bookmark that begins with "_" in Word 2000 or 2003. It should be possible with the API though.
Maxime Vernier
You can have underscores within the name. Rules: 1. starts with an underscore or a letter, 2. followed with at most 39 letters, digits or underscores.^[_A-Za-z]\w{0,39}$
Maxime Vernier
Maxime Vernier: You are right - \d is part of \w. I'll reformulate the Regular expression. :)
Tomalak
@Maxime Vernier: I refer to \w instead of "A-Z" because word characters from foreign languages are explicitly allowed. You may create Hebrew bookmarks, or Chinese ones, for example.
Tomalak
@Tomalak: I have trouble reading your last rexexp ^(?!\d)\w{1,40}$.For me it means starts with an optional character (any character: .) except \d. Since we want \w minus \d, it catches more things than needed and it can match strings of length 41. How about ^[_A-Za-z]\w{0,39}$ instead?
Maxime Vernier
The regular expression contains of a so-called "negative zero-width assertion" (?!\d) meaning "a position not followed by a digit", and 1-40 "word characters". This special construct is necessary because \w{1,40} would allow a string that *begins* with a digit. Your regex would force the first character to be in the A-Z range, but in fact many more characters than this are legal. My variant allows them all.
Tomalak
I understand the "negative zero-width assertion" now. My regexp is ascii only, yours is unicode thus better. Thanks to both of you for your help.
Maxime Vernier