views:

149

answers:

3

I am writing a curl script for collecting information about some sex offenders, i have developed the script that is picking up links like given below:

http://criminaljustice.state.ny.us/cgi/internet/nsor/... (snipped URL)

Now when we go on this link I want to get information under all the fields on this page like Offender Id:, last name etc. into my own variables. I am very weak in regex that is why I am here. Or is there another way?

Can anybody help me in doing that?

+4  A: 

phpQuery is very nice for screen-scraping in PHP. It lets you access the DOM using the same methods jQuery has.

Chad Birch
can yo tell me more about php query how it works?
A: 

I tend to agree with the previous poster about RegEx not being the right tool for the job. If you just want a quick and dirty expression, here goes:

Offender Id:.*
.* [0-9]*

NOTE: You must include the newline in this expression. Also note that this is very fragile as it will break if the source that your are parsing changes much at all.