tags:

views:

498

answers:

9

Hi guys, I was wondering if you could help me formulate a regular expression to match the following pattern?

Any arbitrary length string of numbers, which may or may not be preceded by 0x.

+2  A: 

Could you specify the question more? How do you want to use the match? Which language/regexp implementation.

A simple one that will work with many languages regexp implementations is.

(?:0x)?\d+
sris
+7  A: 

Something like this:

\b(?:0x)?\d+\b

or this, if you want to exclude the optional "0x" from the match:

(?:(?<=\b0x)|\b)\d+\b

The former is:

- a word boundary
- "0x", optional
- decimal digits, at least one
- a word boundary

the latter would be:

- choose
  - either a position preceded by
     - a word boundary
     - "0x"
  - or a word boundary
- decimal digits, at least one
- a word boundary

The latter matches:

- 123456
- 0x123456

but not:

- 0y123456

To match hex digits (as your "0x" implies), use [0-9A-Fa-f] in place of the "\d".

Tomalak
+2  A: 

If you want the whole string to match (nothing else but the numbers):

^(0x)?[0-9]+$

I am using the class [0-9] here to be as portable as possible. You might prefer to use \d wherever implemented.

It works like this:

  • match the beginning of the string: ^
  • match an optional "0x": (0x)?
  • match one or more digits: [0-9]+
  • match the end of the string: $

It gets harder if a preceding "0x" means hex number, and omitted means decimal number:

\b((0x[0-9a-zA-Z]+)|([1-9][0-9]*))\b

This also guards against decimal numbers starting with 0...

Daren Thomas
+1 for distinction between hex and dec.
mouviciel
+1  A: 

Formal regular expression:

(0x)?[0-9]+
soulmerge
A: 

['0x']?[0-9]+

Sandy
this will select a string containing digits, probably preceded by either "0", "x" or "'".
Nathan Fellman
The first part would match either '0x' as a whole or not (the expression is in BNF notation. The second part matches a string of digits, or rather string of numbers. The presence of a delimited character is not mentioned, hence, would be an assumption.
Sandy
+1  A: 

I always like to provide the very baseline REs so they will work on every RE engine, so:

(0x)?[0-9][0-9]*

With suitable boundary conditions (on old RE engines, that would be [ \t]), that should work everywhere.

However, it looks like you're wanting hex characters, if the 0x is correct, so maybe you're after:

(0x)?[0-9A-Fa-f][0-9A-Fa-f]*

or it's equivalent in many of the other excellent suggestions for the more advanced engines.

paxdiablo
A: 

The 0x you mention suggests you want to capture a hexadecimal number. In that case I suggest:

(?:0x)?[[:xdigit:]]+

where [:xdigit:] is the list of all hexadecimal number in Posix notation.

Nathan Fellman
A: 

\b((0x[[:xdigit:]]+)|((0|([1-9][0-9]+)))\b

Yossarian
The OP didn't specify that the string of digits couldn't begin with a bunch of 0s
Nathan Fellman
Didnt. but it looks obvious.
Yossarian
A: 

It all depends on what you mean by a number, and in what context the numbers are allowed. I assume that numbers preceded by 0x are hexadecimal numbers and thus can also contain A-F and a-f.

Given this test string: "a 012 0xa 4_56 num:8 42!"

This regular expression matches "012", "0xa", "4", "56", "8" and "42":

(0x[\dA-Fa-f]+|\d+)

This regular expression matches "012", "0xa", "8" and "42":

\b(0x[\dA-Fa-f]+|\d+)\b

This regular expression matches "0xa", "8" and "42":

\b(0x[\dA-Fa-f]+|[1-9]\d*)\b

This regular expression matches "012" and "0xa":

(?<=\s)(0x[\dA-Fa-f]+|\d+)(?=\s)

This regular expresison matches "0xa":

(?<=\s)(0x[\dA-Fa-f]+|[1-9]\d*)(?=\s)
Guffa