I have a list of books titles:
- "The Hobbit: 70th Anniversary Edition"
- "The Hobbit"
- "The Hobbit (Illustrated/Collector Edition)[There and Back Again]"
- "The Hobbit: or, There and Back Again"
- "The Hobbit: Gift Pack"
and so on...
I thought that if I normalised the titles somehow, it would be easier to implement an automated way to know what book each edition is referring to.
normalised = ''.join([char for char in title
if char in (string.ascii_letters + string.digits)])
or
normalised = ''
for char in title:
if char in ':/()|':
break
normalised += char
return normalised
But obviously they are not working as intended, as titles can contain special characters and editions can basically have very different title layouts.
Help would be very much appreciated! Thanks :)