tags:

views:

261

answers:

3

I'm a newbie to regular expressions and I have the following string:

sequence = '["{\"First\":\"Belyuen,NT,0801\",\"Second\":\"Belyuen,NT,0801\"}","{\"First\":\"Larrakeyah,NT,0801\",\"Second\":\"Larrakeyah,NT,0801\"}"]'

I am trying to extract the text Belyuen,NT,0801 and Larrakeyah,NT,0801 in python. I have the following code which is not working:

re.search('\:\\"...\\', ''.join(sequence))

I.e. I want to get the string between characters :\ and \.

+4  A: 

Don't use regex for this. It appears to be a rather strangely split set of JSON strings. Join them back together and use the json module to decode it.

import json
sequence = '[%s]' % ','.join(sequence)
data = json.loads(sequence)
print data[0]['First'], data[0]['Second']

(Note the json module is new in Python2.6 - if you have a lower version, download and install simplejson).

Daniel Roseman
the sequence is actually of string type (I updated question). the interpreter keeps throwing an error for the line `data = json.loads(sequence)` and the error is `raise ValueError(errmsg("Expecting object", s, end))`
Seth
if I scrap the second line of your code and print `data[0]` I get: `{"First":"Belyuen,NT,0801","Second":"Belyuen,NT,0801"}`
Seth
and if I print `data[0]['First']` it comes up with the following error: ` print data[0]['First'] TypeError: string indices must be integers`
Seth
I ended up being able to extract what I wanted by doing the following: ` data = json.loads(sequence) /n location = json.loads(data[0]) /n print location['First']`
Seth
+4  A: 

it seems like a proper serialization of the Python dict, you could just do:

>>> sequence = ["{\"First\":\"Belyuen,NT,0801\",\"Second\":\"Belyuen,NT,0801\"}","{\"First\":\"Larrakeyah,NT,0801\",\"Second\":\"Larrakeyah,NT,0801\"}"]
>>> import json
>>> for i in sequence:
    d = json.loads(i)
    print(d['First'])


Belyuen,NT,0801
Larrakeyah,NT,0801
SilentGhost
the sequence is actually a string not list ( I updated the question ). so how do I load it into the json module as a string?
Seth
@seth: unfortunately, it seems that the quotes in your input string are misused. it doesn't work either with `json` or `eval`. If you fix them, using alternate single and double quote, escaped where needed, then it works just fine with the method I showed. Again, quotes within string should be alternating, quotes that were used for original Python string, should be, of course, escaped.
SilentGhost
@SilentGhost: thanks for your response, check out my comments in Daniel Roseman's answer. I ended up extracting what I needed in a convoluted way, but got it nevertheless. +1 for your help and useful answer.
Seth
+2  A: 

you don't need regex

>>> sequence = ["{\"First\":\"Belyuen,NT,0801\",\"Second\":\"Belyuen,NT,0801\"}","{\"First\":\"Larrakeyah,NT,0801\",\"Second\":\"Larrakeyah,NT,0801\"}"]
>>> for item in sequence:
...  print eval(item).values()
...
['Belyuen,NT,0801', 'Belyuen,NT,0801']
['Larrakeyah,NT,0801', 'Larrakeyah,NT,0801']
ghostdog74
better use json in this case
hop
solution works in version <2.6. And i don't want to download any other modules.
ghostdog74