I have to process a block of text, which might have some spurious newlines in the middle of some of the fields. I want to strip these newlines out (replacing them with spaces), without stripping out the 'valid' newlines, which are always preceded by a \t
.
So, i want to replace all newlines that are not preceded by a tab with a space. To make things a little more complicated, if there's a space on either side of the newline then i want to keep it. In other words, this
"one\ttwo\tbuckle my \nshoe\t\t\n"
would become
"one\ttwo\tbuckle my shoe\t\t\n"
i.e., with one space between 'my' and 'shoe', not two.
EDIT - some clarification: the unwanted newlines are in the middle of a piece of regular text. If there's a space between the words where the newline occurs, i want to keep it. oherwise, i want to add one in. Eg
"one\ttwo\tbuckle my \nshoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"
"one\ttwo\tbuckle my\nshoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"
"one\ttwo\tbuckle my \n shoe\t\t\n"
=> "one\ttwo\tbuckle my shoe\t\t\n"
EDIT 2: a clumsy but working solution i came up with. I'm not very happy with it, the double-gsubbing seems unelegant.
>> strings = ["one\ttwo\tbuckle my\nshoe\t\t\n", "one\ttwo\tbuckle my \nshoe\t\t\n", "one\ttwo\tbuckle my \n shoe\t\t\n"]
=> ["one\ttwo\tbuckle my\nshoe\t\t\n", "one\ttwo\tbuckle my \nshoe\t\t\n", "one\ttwo\tbuckle my \n shoe\t\t\n"]
>> strings.collect{|s| s.gsub(/[^\t]\n\s?/){|match| match.gsub(/\s*\n\s*/," ")} }
=> ["one\ttwo\tbuckle my shoe\t\t\n", "one\ttwo\tbuckle my shoe\t\t\n", "one\ttwo\tbuckle my shoe\t\t\n"]
This seems to work better than any of the suggestions below given my now extended requirements about adding/preserving spaces.