tags:

views:

79

answers:

7

i have: \£\d+\.\d\d

should find: £6.95 £16.95 etc
+ is one or more
\. is the dot
\d is for a digit

am i wrong? :(


JavaScript for Greasemonkey

// ==UserScript==
// @name           CurConvertor
// @namespace      CurConvertor
// @description    noam smadja
// @include        http://www.zavvi.com/*
// ==/UserScript==
textNodes = document.evaluate(
                              "//text()",
                              document,
                              null,
                              XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                              null);
var searchRE = /\£[0-9]\+.[0-9][0-9];
var replace = 'pling';
for (var i=0;i<textNodes.snapshotLength;i++) {
    var node = textNodes.snapshotItem(i);
    node.data = node.data.replace(searchRE, replace);
}

when i change the regex to /Free for example it finds and changes. but i guess i am missing something!

A: 

The dot will match any character. Escape the dot:

£\d+\.\d\d
fig
it is escaped..
Noam Smadja
Don't escape the pound symbol.
fig
+2  A: 

You are not wrong, but there are a few things to watch out for:

  • The £ sign is not a standard ASCII character so you may have encoding issue, or you may need to enable a unicode option on your regular expression.
  • The use of \d is not supported in all regular expression engines. [0-9] or [[:digit:]] are other possibilities.

To get a better answer, say which language you are using, and preferably also post your source code.

Mark Byers
A: 

It depends on what flavour of regex you are using - what is the programming language?

some older versions of regex require the + to be escaped - sed and vi for example.

Also some older versions of regex do not recognise \d as matching a digit.

Most modern regex follow the perl syntax and £\d+\.\d\d should do the trick, but it does also depend on how the £ is encoded - if the string you are matching encodes it differently from the regex then it will not match.

Here is an example in Python - the £ character is represented differently in a regular string and a unicode string (prefixed with a u):

>>> "£"
'\xc2\xa3'
>>> u"£"
u'\xa3'
>>> import re
>>> print re.match("£", u"£")
None
>>> print re.match(u"£", "£")
None
>>> print re.match(u"£", u"£")
<_sre.SRE_Match object at 0x7ef34de8>
>>> print re.match("£", "£")
<_sre.SRE_Match object at 0x7ef34e90>
>>>
Dave Kirby
A: 

£ isn't an ascii character, so you need to work out encodings. Depending on the language, you will either need to escape the byte(s) of £ in the regex, or convert all the strings into Unicode before applying the regex.

Douglas Leeder
A: 

In Ruby you could just write the following

/£\d+.\d{2}/

Using the braces to specify number of digits after the point makes it slightly clearer

Oli
i mentioned it is JavaScript
Noam Smadja
Sorry, I replied before you added in the Javascript code.
Oli
+1  A: 

Had this written up for your last question just before it was deleted.

Here are the problems you're having with your GM script.

  1. You're checking absolutely every text node on the page for some reason. This isn't causing it to break but it's unnecessary and slow. It would be better to look for text nodes inside .price nodes and .rrp .strike nodes instead.

  2. When creating new regexp objects in this way, backslashes must be escaped, ex:

    var searchRE = new RegExp('\\d\\d','gi');

    not

    var searchRE = new RegExp('\d\d','gi');

    So you can add the backslashes, or create your regex like this:

    var searchRE = /\d\d/gi;

  3. Your actual regular expression is only checking for numbers like ##ANYCHARACTER##, and will ignore £5.00 and £128.24

  4. Your replacement needs to be either a string or a callback function, not a regular expression object.


Putting it all together

textNodes = document.evaluate(
                              "//p[contains(@class,'price')]/text() | //p[contains(@class,'rrp')]/span[contains(@class,'strike')]/text()",
                              document,
                              null,
                              XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
                              null);
var searchRE = /£(\d+\.\d\d)/gi;
var replace = function(str,p1){return "₪" + ( (p1*5.67).toFixed(2) );}

for (var i=0,l=textNodes.snapshotLength;i<l;i++) {
    var node = textNodes.snapshotItem(i);
    node.data = node.data.replace(searchRE, replace);
}

Changes:

  • Xpath now includes only p.price and p.rrp span.strke nodes
  • Search regular expression created with /regex/ instead of new RegExp
  • Search variable now includes target currency symbol
  • Replace variable is now a function that replaces the currency symbol with a new symbol, and multiplies the first matched substring with substring * 5.67
  • for loop sets a variable to the snapshot length at the beginning of the loop, instead of checking textNodes.snapshotLength at the beginning of every loop.

Hope that helps!

[edit]Some of these points don't apply, as the original question changed a few times, but the final script is relevant, and the points may still be of interest to you for why your script was failing originally.

Billiam
Sorry for deleting earlier.. :/ i guess it would have wasted you time unless you found this! sorry again!:) yup you are right.. but, could you explain the regex problem i had plz?
Noam Smadja
Well, as I said, there were a few problems.In the current revision, you're using: /\£[0-9]\+.[0-9][0-9]/ instead of /£[0-9]+\.[0-9][0-9]/ (Don't escape the pound sign or the +, but do escape the period). \d and [0-9] are equivalent.
Billiam
ok. when i was using /£\d+\.\d\d/gi (with out the brackets) nothing was hapening.. brackets are used for "keeping" what you found? isnt it?
Noam Smadja
+1  A: 
£[0-9]+(,[0-9]{3})*\.[0-9]{2}$

this will match anything from £dd.dd to £d[dd]*,ddd.dd. So it can fetch millions and hundreds as well.

The above regexp is not strict in terms of syntaxes. You can have, for example: 1123213123.23

Now, if you want an even strict regexp, and you're 100% sure that the prices will follow the comma and period syntaxes accordingly, then use

£[0-9]{1,3}(,[0-9]{3})*\.[0-9]{2}$

Try your regexps here to see what works for you and what not http://tools.netshiftmedia.com/regexlibrary/

Ben
great regex tool.. thx!
Noam Smadja