tags:

views:

74

answers:

6

Hi,

I need a to split a string of the form

2,9.1,The Godfather (1972), (it's a csv line)

to:

2
9.1
The Godfather
1972

any ideas for a good regular expression?

BTW, if you know a good regular expressions creator based on examples you provide it'd be great. I'm a bit new to this..

10x!!

+4  A: 
(\d+)\.(\d+\.\d+),(.*?)(?= \()\((\d{4})\)
^^^^^  ^^^^^^^^^^ ^^^^^^^^^^^^  ^^^^^^^
2      9.1        Title        Year
cletus
+2  A: 

Maybe this can help? http://regexlib.com/

Sobe
A: 

A little time with Google gave me this: /,(?!(?:[^",]|[^"],[^"])+")/. Seeems to split CSV just fine.

>>> '2,9.1,The Godfather (1972)'.split(/,(?!(?:[^",]|[^"],[^"])+")/)
["2", "9.1", "The Godfather (1972)"]
Reinis I.
He wanted "The Godfather" and "1972" to be separate.
Ryan Bigg
A: 

If you are sure that the format is static, you can use this:

(\d+),(\d+\.\d+),(.*?) \((\d+)\)

But if it can contain more information, use a real CSV parser to read the line and then just split The Godfather (1972) using (.*?) \((\d+)\).

Lukáš Lalinský
Yes, for example, Python has a good csv parser built in
gnibbler
A: 

CSV has a lot of corner cases, your regexp approach might take you into a world of pain.

For example if the title has a comma in it, the title would then be double quoted. Which would screw up with all of the regexps given so far.

gnibbler
+1  A: 

I wouldn't recommend using regex to split the csv files as it can't handle comma escaping well. But having that said, how about using the simplest available solution?

A simplest regex like this should solve your problem

'(.*?),(.*?),(.*?)\((\d+)\)'
Piotr Czapla