ansaurus

Question

regex question - replace all newlines that are not preceded by a tab with a space

Answer 1

A:

str = str.gsub(/\s*(?<!\t)\n\s*/, " ")

reko_t 2010-08-09 11:34:48

thanks reko. That doesn't seem to make any difference to my strings: see my edit above.

Max Williams 2010-08-09 13:36:29

Sorry, there was a typo in the regexp, `(<?` is supposed to be `(?<`. Try now again.

reko_t 2010-08-09 13:51:14

Answer 2

+1 A:

No lookbehind option

You can match:

(\G|[^\t])\n

And replace with backreference to what group 1 matched.

Here's a Ruby snippet (as seen on ideone.com):

from = "\none\ttwo\tbuckle my \nshoe\t\t\nx\n\n\t\n\n"
to   = "one\ttwo\tbuckle my shoe\t\t\nx\t\n"

mod  = from.gsub(/(\G|[^\t])\n/, '\1')

puts (mod == to) # true

Essentially we either match "something" that's not a \t, followed by an \n, and replace with only the "something" part (effectively preserving whatever "it" is, but deleting the \n), or we can simply continue from previous match using \G, to allow \n at the beginning of the string or following another deleted \n.

References

regular-expressions.info/Character Class […]
- Brackets for Grouping and Backreferences (…)
- Continuing Previous Match \G

Lookbehind option

If the flavor supports lookbehind, you can also match:

(?<!\t)\n

And simply replace with the empty string.

References

regular-expressions.info/Lookarounds

polygenelubricants 2010-08-09 11:51:22

Answer 3

A:

With a double-negative ([^\S\t] means all whitespace except TAB characters)

def fix(str)
  return str.gsub(/([^\t]|^)[^\S\t]+/, '\1 ')
end

the following tests

#! /usr/bin/ruby

require "test/unit"
require "test/unit/ui/console/testrunner"

class MyTestCases < Test::Unit::TestCase
  def test_after_space
    assert_equal fix("one\ttwo\tbuckle my \nshoe\t\t\n"),
                     "one\ttwo\tbuckle my shoe\t\t\n"
  end

  def test_no_whitespace_neighbors
    assert_equal fix("one\ttwo\tbuckle my\nshoe\t\t\n"),
                     "one\ttwo\tbuckle my shoe\t\t\n"
  end

  def test_whitespace_surrounded
    assert_equal fix("one\ttwo\tbuckle my \n shoe\t\t\n"),
                     "one\ttwo\tbuckle my shoe\t\t\n"
  end

  def test_leading_newline
    assert_equal fix("\none\ttwo"),
                     " one\ttwo"
  end
end

Test::Unit::UI::Console::TestRunner.run(MyTestCases)

all pass:

Loaded suite MyTestCases
Started
....
Finished in 0.000412 seconds.

4 tests, 4 assertions, 0 failures, 0 errors

Greg Bacon 2010-08-21 17:39:44

ansaurus

tags:

views:

answers:

regex question - replace all newlines that are not preceded by a tab with a space

No lookbehind option

References

Lookbehind option

References

related questions