ansaurus

Question

Answer 1

A:

The dot will match any character. Escape the dot:

£\d+\.\d\d

fig 2010-04-11 21:56:11

it is escaped..

Noam Smadja 2010-04-11 21:57:52

Don't escape the pound symbol.

fig 2010-04-11 21:58:59

Answer 2

+2 A:

You are not wrong, but there are a few things to watch out for:

The £ sign is not a standard ASCII character so you may have encoding issue, or you may need to enable a unicode option on your regular expression.
The use of \d is not supported in all regular expression engines. [0-9] or [[:digit:]] are other possibilities.

To get a better answer, say which language you are using, and preferably also post your source code.

Mark Byers 2010-04-11 21:59:48

Answer 3

A:

It depends on what flavour of regex you are using - what is the programming language?

some older versions of regex require the + to be escaped - sed and vi for example.

Also some older versions of regex do not recognise \d as matching a digit.

Most modern regex follow the perl syntax and £\d+\.\d\d should do the trick, but it does also depend on how the £ is encoded - if the string you are matching encodes it differently from the regex then it will not match.

Here is an example in Python - the £ character is represented differently in a regular string and a unicode string (prefixed with a u):

>>> "£"
'\xc2\xa3'
>>> u"£"
u'\xa3'
>>> import re
>>> print re.match("£", u"£")
None
>>> print re.match(u"£", "£")
None
>>> print re.match(u"£", u"£")
<_sre.SRE_Match object at 0x7ef34de8>
>>> print re.match("£", "£")
<_sre.SRE_Match object at 0x7ef34e90>
>>>

Dave Kirby 2010-04-11 22:05:50

Answer 4

A:

£ isn't an ascii character, so you need to work out encodings. Depending on the language, you will either need to escape the byte(s) of £ in the regex, or convert all the strings into Unicode before applying the regex.

Douglas Leeder 2010-04-11 22:06:36

Answer 5

A:

In Ruby you could just write the following

/£\d+.\d{2}/

Using the braces to specify number of digits after the point makes it slightly clearer

Oli 2010-04-11 22:06:42

i mentioned it is JavaScript

Noam Smadja 2010-04-11 22:38:22

Sorry, I replied before you added in the Javascript code.

Oli 2010-04-12 19:07:33

Answer 6

+1 A:

Had this written up for your last question just before it was deleted.

Here are the problems you're having with your GM script.

You're checking absolutely every text node on the page for some reason. This isn't causing it to break but it's unnecessary and slow. It would be better to look for text nodes inside .price nodes and .rrp .strike nodes instead.
When creating new regexp objects in this way, backslashes must be escaped, ex:

var searchRE = new RegExp('\\d\\d','gi');

not

var searchRE = new RegExp('\d\d','gi');

So you can add the backslashes, or create your regex like this:

var searchRE = /\d\d/gi;
Your actual regular expression is only checking for numbers like ##ANYCHARACTER##, and will ignore £5.00 and £128.24
Your replacement needs to be either a string or a callback function, not a regular expression object.

Putting it all together

textNodes = document.evaluate(
                              "//p[contains(@class,'price')]/text() | //p[contains(@class,'rrp')]/span[contains(@class,'strike')]/text()",
                              document,
                              null,
                              XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                              null);
var searchRE = /£(\d+\.\d\d)/gi;
var replace = function(str,p1){return "₪" + ( (p1*5.67).toFixed(2) );}

for (var i=0,l=textNodes.snapshotLength;i<l;i++) {
    var node = textNodes.snapshotItem(i);
    node.data = node.data.replace(searchRE, replace);
}

Changes:

Xpath now includes only p.price and p.rrp span.strke nodes
Search regular expression created with /regex/ instead of new RegExp
Search variable now includes target currency symbol
Replace variable is now a function that replaces the currency symbol with a new symbol, and multiplies the first matched substring with substring * 5.67
for loop sets a variable to the snapshot length at the beginning of the loop, instead of checking textNodes.snapshotLength at the beginning of every loop.

Hope that helps!

[edit]Some of these points don't apply, as the original question changed a few times, but the final script is relevant, and the points may still be of interest to you for why your script was failing originally.

Billiam 2010-04-11 22:07:40

Sorry for deleting earlier.. :/ i guess it would have wasted you time unless you found this! sorry again!:) yup you are right.. but, could you explain the regex problem i had plz?

Noam Smadja 2010-04-11 22:25:29

Well, as I said, there were a few problems.In the current revision, you're using: /\£[0-9]\+.[0-9][0-9]/ instead of /£[0-9]+\.[0-9][0-9]/ (Don't escape the pound sign or the +, but do escape the period). \d and [0-9] are equivalent.

Billiam 2010-04-11 22:31:05

ok. when i was using /£\d+\.\d\d/gi (with out the brackets) nothing was hapening.. brackets are used for "keeping" what you found? isnt it?

Noam Smadja 2010-04-11 22:35:23

Answer 7

+1 A:

£[0-9]+(,[0-9]{3})*\.[0-9]{2}$

this will match anything from £dd.dd to £d[dd]*,ddd.dd. So it can fetch millions and hundreds as well.

The above regexp is not strict in terms of syntaxes. You can have, for example: 1123213123.23

Now, if you want an even strict regexp, and you're 100% sure that the prices will follow the comma and period syntaxes accordingly, then use

£[0-9]{1,3}(,[0-9]{3})*\.[0-9]{2}$

Try your regexps here to see what works for you and what not http://tools.netshiftmedia.com/regexlibrary/

Ben 2010-04-11 22:11:57

great regex tool.. thx!

Noam Smadja 2010-04-11 22:37:54

ansaurus

tags:

views:

answers:

RegEx for a price in £

Putting it all together

related questions