ansaurus

Question

How to use regular expressions to pull a substring? (screen scraping)

Answer 1

+1 A:

http://www\.example\.com/online/store/TitleDetail\?detail&amp;sku=\d+

use the \d group with a "Greedy" +, to qualify any integer value in the sku field

arthurprs 2010-05-20 01:20:45

This def worked. Thanks!

Diego 2010-05-28 02:40:19

Answer 2

A:

You don't need regular expressions for that, just use string methods:

result = html[0].split("window.location='")[1].split("'")[0]

David Morrissey 2010-05-20 01:20:46

Answer 3

A:

pattern = re.compile(r"window.location=\\'([^\\]*)")
haystack = r"""<a href="javascript:if(handleDoubleClick(this.id)){window.location=\'http://www.example.com/online/store/TitleDetail?detail&amp;sku=123456789\';}" id="getTitleDetails_123456789">\r\n\t\t\t\tcheck store inventory\r\n\t\t\t</a>"""
url = re.search(pattern, haystack).group(1)

Matthew Flaschen 2010-05-20 01:24:22

Answer 4

A:

if there are always 9 digits

http://www.example.com/online/store/TitleDetail?detail&amp;sku=[0-9]{9}

if there are an arbitrary number of digits:

http://www.example.com/online/store/TitleDetail?detail&amp;sku=[0-9]*

more general:

http*?sku=[0-9]*

(the ? in *? means it will find shorter matches first, so it is less likely to find a match that spans multiple URLs.)

edit: [0-9]. not [1-9]

themissinglint 2010-05-20 01:43:17

Answer 5

A:

http://txt2re.com/ might help you

Zach 2010-05-20 02:33:09

ansaurus

tags:

views:

answers:

How to use regular expressions to pull a substring? (screen scraping)

related questions