tags:

views:

192

answers:

3

Hi,

I have a text file with multiple lines. I'll try to set a pattern to add a new carriage return in some lines of the text. This lines are like that:

lorem ipsum.
dolor sit amet, consectetur adipiscing elit [FIS] Donec feugiat

Well, the pattern is a line followed by other which has some characters and a '[' character too. If '[' is not present the pattern fails and the carriage return hasn't be added.

How can I do it using regular expressions??

I'm using C# as programming language and regex engine too.

+1  A: 

I believe you can use \r for carriage return and \n for new line

northpole
+1  A: 

What flavor? Here it's done for C#:

string yourString = @"el tiempo.
campo vectorial vector field. [FIS] Campo ";
string newString = Regex.Replace(yourString, "el tiempo.", "$0\r\n");  // just \n may be sufficient though

EDIT: the above is an answer to the original question. After the excellent answer by Peter Boughton, I don't need to add much. Well, perhaps this, a little regex without look-around assertions, will simply replace all dots followed by one or more newlines with two newlines.

string newString = Regex.Replace(yourString, @"\.(\r|\n)+", ".\r\n\r\n");
Abel
Please, could you be more generic??? Imagine that "el tiempo." is any text that '.' is its last character.
jaloplo
jaloplo, if you want more generic you should state this in the question. You also haven't specified if there should be a linebreak added in the "field. [FIS]" part.
Peter Boughton
I think this was obvious, but I apologize about it. I'll try to be more specific in next questions.
jaloplo
I don't have the "rights" to edit questions, but for us and future visitors, can you update the q. with: 1) the programming language, 2) the pattern in English words? Even though there's a good answer meanwhile, it'll help others if questions are clear :)
Abel
+1  A: 

If you want to add a line break after a . then you just replace it with itself and a line break. To make sure it is the last character, use a lookahead to check it is followed by whitespace, i.e. (?=\s)


So, to replace with newline character (recommended for most situations):

replace( input , '\.(?=\s)' , '\.\n' )


If you must use carriage return (and there are very few places that require it, even on Windows), you can simply add one:

replace( input , '\.(?=\s)' , '\.\r\n' )


If you want to ensure that a . is always followed by two line breaks, and not cause extra line breaks if they are already want, then it gets a little more complex, and required a negative lookahead, but looks like this:

replace( input , '\.(?!\S)(?:\r?\n){0,2}' , '\.\r\n\r\n' )

Because regex engines default to greedy, the {0,2} will try to match twice, then once, then zero times - at which point the negative lookahead for a non-space makes sure it is actually the end of a word.

(If you might have more than two newlines and want to reduce to two, you can just use {0,} instead, which has * as a shortcut notation.)


It's probably worth pointing out that none of the above will consume any spaces/tabs - if this is desired the lookaheads can either be changed from (?=\s) to \s+, you could can do a second replace of \n[ \t]+ with \n to remove any leading spaces/tabs, or something similar, depending on exactly what you're trying to do.

Peter Boughton
Ok, good explanation. But I have to include if next line some characters and '['. If this rule is Ok then it has to be added '\r\n'. Can you tell me how can I do this?
jaloplo
Do you mean you want to *avoid* adding the newline if a `[` is found? If so, the middle example can be updated to `\.(?=\s++[^\[])` - assuming your regex engine supports possessive quantifiers anyway - which flavour of regex are you using this with?
Peter Boughton
I use it on C#. I'll test it and tell you if it was fine.
jaloplo
@jaloplo: can you do us a favor and update your question to include _all_ your requirements? This way it gets scattered through the comments little by little and that's rather hard to follow.
Abel
@Peter: apparently jaloplo uses C#, I added the tag. Currently, .NET does not support them (mainly PCRE and Java http://www.regular-expressions.info/possessive.html do support it).
Abel
Ah, but .NET *does* support atomic grouping, which is the long-winded equivalent, so that `\s++` becomes `(?>\s+)` and it should otherwise work the same.
Peter Boughton