I have a method which scans plain text (specifically in the QIF format) looking for dates which occur after a 'D' on a new line:
dates = "D2009-11-12\nPApple Store\nMSnow Leopard\nD2009-11-13\nPApple Store\nMiMac".scan(/^\s*D"?(.+?)[\r\n?|\n]/m)
# => [["2009-11-12"], ["2009-11-13"]]
"D2009-11-12\r\nPApple Store\r\nMSnow Leopard\r\nD2009-11-13\r\nPApple Store\r\nMiMac".scan(/^\s*D"?(.+?)[\r\n?|\n]/m)
# => [["2009-11-12"], ["2009-11-13"]]
This works well across a variety of format, but I've just come across an issue with files generated from Quicken on the Mac, which saves them in MacOS Classic format. That is to say the lines are delimited using carriage returns, not new lines (i.e. '\r' not '\n' or '\n\r').
"D2009-11-12\rPApple Store\rMSnow Leopard\rD2009-11-13\rPApple Store\rMiMac".scan(/^\s*D"?(.+?)[\r\n?|\n]/m)
# => [["2009-11-12"]]
The problem appears to be that Ruby's multi-line regex code doesn't consider '\r' to be a new line delimiter (which of course it isn't).
What is the best way to support the original parsing yet also handle these Mac OS Classic files?
Should I replace all occurrances of '\r' with '\n\r' and, if so, how should I go about doing this since a call to string.gsub(/\r/, '\n\r')
will result in \n\r\r
being replaced in some scenarios. I would like to call string.gsub(/[^\n]\r/, '$1\n\r')
but this isn't supported by the gsub
method.