views:

98

answers:

1

Here's what I'm using: ".+/@[^/]+$". Can you think of a reason why this might not work?

+2  A: 

This is actually a very subtle problem and I think a great question.

My understanding is that an (abbreviated) XPATH points to an attribute if and only its last @ is not within a predicate, that is, something of the form [...], and has no steps after it (something like /...). I think this has the relatively simple regular expression @[^]/]*$, that is, there must be an @ that has no ]s nor /s after it. Also, if you want to cover unabbreviated XPATHs, you can use (@|attribute::)[^]/]*$

I've included a test harness that may prove useful in checking this or other tests. Note also that there may be whitespace in between tokens which can complicate some regexs.

Positive (an attribute)

  • @* or @a or ../@a or a/@b
  • a[@b and @c]/@d
  • a[b[@c="d"]/e[@f and @g]]/h[@i="j"]/@k

Negative (not an attribute)

  • a[@b] or a[@b and @c]
  • a[b[@c and @d]/@e]
  • a[b[@c="d"]/e[@f and @g]]/h[@i="j"]/k[5][@l="m"]

I can't think of a legal example where there is a / but not a ] after the last example, but I think there might be one.

Hopefully these examples make it at least a little clear that there can be arbitrary nesting of [ and ] together with @s anywhere in between. Luckily, I think only the very last @ and its nesting level matters.

(For reference, the OP's regex fails on @a. My original regex failed on a[@b and @c].)

Edit: It turns out that there are more corner cases, which convinces me that there is no perfectly-correct regular expression. For example, once you have an attribute node, there are many ways of keeping it, e.g. //@a// or //@a/. in the abbreviated syntax. There are also a variety of more creative ways, such as //@f//[node()]. All in all, it seems that if you want to cover these cases, you need to be able to match [ and ], which a basic regular expression cannot do. On the other hand, you could decide this is too contrived ...

A. Rex
Oh snap, nice work. These are two perfect edge cases for me to think about. Thanks!
JamesBrownIsDead
This is failing for simple cases: `@*[name() != 'foo]` or `@a[ancestor::p]`.
Tomalak
@Tomalak: You're right, of course. I stand by my statement that there is in fact no regular expression that works, because it would have to keep track of nesting ...
A. Rex