I'm working with some log files that are very poorly formatted, the column delimiter is an item that (often) appears within the field and it isn't escaped. For example:
sam,male,september,brown,blue,i like cats, and i like dogs
Where:
name,gender,month,hair,eyes,about
So as you can see, the about contains the column delimiter which means a single parse by the delimiter won't work, because it'll separate the about me into two separate columns. Now imagine this with a chat system... you can visualize the issues I'm sure.
So, theoretically what's the best approach to solving this? I'm not looking for a language specific implementation but more of a general pointer to the correct direction, or some ideas on how others have solved it... without doing it manually.
Edit:
I should clarify, my actual logs are in a much worse state. There are these fields with delimiter characters everywhere, there is no pattern that I can locate.