views:

92

answers:

5

I'm importing some data from a database. The data has been stored by a CMS written in php where I have no control. Here is the data (a dense report from a paypal response):

a:56:{
s:8:"business";s:19:"[email protected]";
s:14:"receiver_email";s:19:"[email protected]";
s:11:"receiver_id";s:13:"KVBRSDFJKLWYE";
s:9:"item_name";s:4:"ABCD";
s:11:"item_number";s:1:"7";
s:8:"quantity";s:1:"1";
s:7:"invoice";s:0:"";
s:6:"custom";s:3:"800";
s:4:"memo";s:0:"";
s:3:"tax";s:4:"0.00";
s:12:"option_name1";s:0:"";
s:17:"option_selection1";s:0:"";
s:12:"option_name2";s:0:"";
s:17:"option_selection2";s:0:"";
s:14:"num_cart_items";s:1:"1";
s:8:"mc_gross";s:6:"255.00";
s:6:"mc_fee";s:5:"19.75";
s:11:"mc_currency";s:3:"USD";
s:13:"payment_gross";s:6:"255.00";
s:11:"payment_fee";s:5:"19.75";
s:14:"payment_status";s:9:"Completed";
s:14:"pending_reason";s:0:"";
s:11:"reason_code";s:0:"";
s:12:"payment_date";s:25:"02:11:51 Sep 15, 2006 PDT";
s:6:"txn_id";s:17:"1EG20446283704116";
s:8:"txn_type";s:4:"cart";
s:12:"payment_type";s:7:"instant";
s:10:"first_name";s:5:"abcde";
s:9:"last_name";s:6:"Abcdef";
s:19:"payer_business_name";s:0:"";
s:12:"address_name";s:12:"abcde Abcdef";
s:14:"address_street";s:24:"asdkjhgfs;lkefh sdfkj 21";
s:12:"address_city";s:15:"agflkjsgkjhsddg";
s:13:"address_state";s:3:"HDJ";
s:11:"address_zip";s:5:"64525";
s:20:"address_country_code";s:2:"DE";
s:15:"address_country";s:7:"Germany";
s:14:"address_status";s:11:"unconfirmed";
s:11:"payer_email";s:15:"[email protected]";
s:8:"payer_id";s:13:"U89LQDFJGKCJG";
s:12:"payer_status";s:8:"verified";
s:9:"member_id";s:3:"800";
s:11:"verify_sign";s:56:"A1JC72dfgkljhdghjwlQocysUrWOAXNp57t4TP6QkJgCt9.qk7A4UuEq";
s:8:"test_ipn";s:0:"";
s:12:"item_number1";s:1:"7";
s:7:"charset";s:12:"windows-1252";
s:11:"mc_shipping";s:4:"0.00";
s:11:"mc_handling";s:4:"0.00";
s:14:"notify_version";s:3:"2.1";
s:12:"mc_handling1";s:4:"0.00";
s:12:"mc_shipping1";s:4:"0.00";
s:10:"item_name1";s:50:"sdlkjgsdfghlsdkgdhlkjsdggkljdfhlkjsddflkhlkdldfkgj";
s:9:"quantity1";s:1:"1";
s:10:"mc_gross_1";s:6:"255.00";
s:17:"residence_country";s:2:"DE";
s:11:"screen_name";s:8:"dfglkjlf";
}

As you can see this is straightforward to read. In my code I would like to grab some of the fields (let's say the value of payment_fee). How can I do that? I guess the best would be to use a regular expression but I'm a true rookie with Regexps. Of course I don't want to count the number of colons and quotes to get to the field. I would prefer an automatic way.

Note: I don't care about the s:xx. As you guess it means a string with xx characters and I don't need to validate that.

Thank you for your help.

A: 

This regex will group the payment fee.

'payment_fee\";s:\d*:"(\d*\.\d*)'

in Python:

s = 's:11:"payment_fee";s:5:"19.75";'
regex = 'payment_fee\";s:\d*:\"(\d*\.\d*)'

payment_fee = re.search(regex, s).groups[0] # returns '19.75'
Chris Lawlor
with some adjustments it works, but only for payment_fee.
Nicolas Cadilhac
You can substitute any of the other field names, of course this is only a good approach if you are only looking for a small number of fields.
Chris Lawlor
I am not talking about field names but about the \d*\.\d* which will only return floating numbers. I need this group to return the string enclosed in double quotes.
Nicolas Cadilhac
A: 

This appears to be serialized PHP objects. There are probably some Python packages that you can use to unserialize this data - I was able to find one package called phpserialize that might be of interest, but I've never used it, so I can't comment on how well it works. There might be others out there.

Thomas Owens
+1  A: 

This regex should allow you to find any field value. Adjust character escaping as needed

var regex = fieldName + "\";s:\\d*:\"([^\"]*)\"'

(this is c#)

Note that this will return incomplete values if the strings contain a " character...

David Thibault
[^\"] inside the group does not seem to work. match.Success returns false.
Nicolas Cadilhac
There was just a star missing after the [] block. Maybe you can update your answer. Anyway you were the closest to my need so you get the accepted answer. Other people helped too... thanks to them.
Nicolas Cadilhac
That's right sorry for the mistake!
David Thibault
A: 

How about something like this:

 string fieldName = "address_status";
 string pattern = String.Format(@".*\"{0}\";s:[0-9]+:(\"[^\"]*\").*", fieldName);
 string value = Regex.Replace(line, pattern, @"$1");
Jeremy E
+1  A: 

Here's a c# unserialization library for php strings: http://sourceforge.net/projects/csphpserial/

I'm not a C# guy, so your mileage may vary, but it looks like it's been around for a while.

Paul Huff
Interesting. Maybe this would have worked too but I found my solution in another answer. Thanks anyway.
Nicolas Cadilhac