ansaurus

Question

Answer 1

A:

Daok 2008-11-29 15:32:00

Doesn't work, unfortunately :(

Phill Sacre 2008-11-29 15:33:31

I have edited the regex and removed few other things. Check the screenshot. Are you sure it's not the way you use the match after the regex?

Daok 2008-11-29 15:35:57

I just noticed your edit. If the regex work fine it might be the encoding of the page from the curl that give you some encoding problem with $ and £. You might want to output the curl data to check it.

Daok 2008-11-29 15:37:25

Yep, turns out curl was giving encoding ISO-8859-1, which apparently PHP doesn't like. Converting to UTF-8 seems to work.

Phill Sacre 2008-11-29 16:19:04

Answer 2

+1 A:

maybe pound has it's html entity replacement? i think you should try your regexp with some sort of couching program (i.e. match it against fixed text locally).

i'd change my regexp like this: '/(?:\$|£)\d+(?:\.\d{2})?/'

Eimantas 2008-11-29 15:37:09

Thanks - I tried saving it locally and it came up with an error when opening the file. If I convert the string to utf8, it works! So I guess I just need to detect the charset.

Phill Sacre 2008-11-29 15:45:15

Answer 3

A:

This should work for simple values.

'#(?:\$|\£|\€)(\d+(?:\.\d+)?)#'

This will not work with thousand separator like 234,343 and 34,454.45.

OIS 2008-11-29 15:41:28

ansaurus

tags:

views:

answers:

Scrape a price off a website

related questions