Hi, I got a string of such format:
"Wilbur Smith (Billy, son of John), Eddie Murphy (John), Elvis Presley, Jane Doe (Jane Doe)"
so basicly it's list of actor's names (optionally followed by their role in parenthesis). The role itself can contain comma (actor's name can not, I strongly hope so).
My goal is to split this string into a list of pairs - (actor name, actor role)
.
One obvious solution would be to go through each character, check for occurances of '('
, ')'
and ','
and split it whenever a comma outside occures. But this seems a bit heavy...
I was thinking about spliting it using a regexp: first split the string by parenthesis:
import re
x = "Wilbur Smith (Billy, son of John), Eddie Murphy (John), Elvis Presley, Jane Doe (Jane Doe)"
s = re.split(r'[()]', x)
# ['Wilbur Smith ', 'Billy, son of John', ', Eddie Murphy ', 'John', ', Elvis Presley, Jane Doe ', 'Jane Doe', '']
The odd elements here are actor names, even are the roles. Then I could split the names by commas and somehow extract the name-role pairs. But this seems even worse then my 1st approach.
Are there any easier / nicer ways to do this, either with a single regexp or a nice piece of code?