views:

110

answers:

6

How do I match U1234, but not \U1234 in Javascript?

I can't figure out how to not match the single backslash. The closest I can get is:

\[\\]{0}U[0-9]{4}\b

But that doesn't work. Any suggestions?

+1  A: 

[^\\]U[0-9]{4} or something along these lines. It will not match the sequence on the very beginning of subject string…

Michael Krelin - hacker
Escape that slash in the negated character set; it is currently escaping the closing bracket instead... Try [^\\]U[0-9]{4}
Chris Nielsen
Thanks, Chris. Edited.
Michael Krelin - hacker
A: 

Unfortunately JS doesn't seem to support proper syntax for this, i.e. back assertion /(?<!\\)U[0-9]{4}/.

So you need to use:

/[^\\]U[0-9]{4}/

This is syntax for regexp literal. If you put regexp in a string, you have to escape backslashes again:

"[^\\\\]U[0-9]{4}"
porneL
A: 

I would suggest using lookbehind, but JavaScript doesn't seem to support it [[1]]. Maybe you can match on U[0-9]{4}, find where the match is, and check the character to its left to see if it's a \ or not?

Twisol
+8  A: 

JavaScript definitely does not support lookbehind assertions. The next best way to get what you want, in my opinion, would be

(?:^|[^\\])(U[0-9]{4})

Explanation:

(?:          # non-capturing group - if it matches, we don't want to keep it
   ^         # either match the beginning of the string
   |         # or
   [^\\]     # match any character except for a backslash
)            # end of non-capturing group
(U\d{4})     # capturing group number 1: Match U+4 digits
Tim Pietzcker
+1 what I'd put, before SO went all 503 on us...
bobince
+1 for explaining the regex
nickf
oh, on a nit-picky point of note, if you want to match 0, 1, 2... 9 then you should use `[0-9]`. `\d` actually matches a lot more than 0-9.. there's a lot of unicode characters which are numbers in different character sets.
nickf
@nickf: `\d` only matches `[0-9]` in JavaScript--or it's supposed to, according to ECMA. Source: http://blog.stevenlevithan.com/archives/javascript-regex-and-unicode Steve's `\d` test works correctly in every browser I have installed at the moment.
Alan Moore
A: 

JavaScript's RegExp does not support negative look-behind assertions. Ideas that propose you match only /[^\]U/ will match strings like "_U", so that's not the answer. Your best bet is to use two regular expressions, the first to find all occurrences, then the second to filter the look-behind.

"\\U0000 U0000".match(/\\?U[0-9]{4}/g)
.filter(function (match) {
    return !/^\\/.test(match)
})
Kris Kowal
A: 

Ummm ... Is \^U[0-9]{4}\b works for you?

NawaMan