I am trying to extract publisher information from a string. It comes in various formats such as:
John Wiley & Sons (1995), Paperback, 154 pages
New York, Crowell [1963] viii, 373 p. illus. 20 cm.
New York: Bantam Books, c1990. xx, 444 p. : ill. ; 27 cm.
Garden City, N.Y., Doubleday, 1963. 142 p. illus. 22 cm. [1st ed.]
All I want to extract is the publisher name, so everything after the ( or the [ can be ignored. I'd need to grab any character before this, however. And it's complicated by the fact that for example three, I'd want to grab the information before the comma, but in example two, I'd want to grab the information before the square bracket only and keep that comma if possible.
I'm willing to work with a regex that takes everything before ( [ and , and work with any imperfect data (like only getting "New York" for example 2), since I wouldn't want to insert all of example 3 into the database. The majority of the data have the date in brackets as in examples 1 and 2.
Thanks in advance for any suggestions!