tags:

views:

122

answers:

4

Have a text file that denotes remarks with a single '. Some lines have two quotes but need to get everything from the first instance of a ' and the line feed.

I AL01 ' A-LINE '091398 GDK 33394178
402922 0831850 ' '091398 GDK 33394179
I AL02 ' A-LINE '091398 GDK 33394180
400722 0833118 ' '091398 GDK 33394181
I A10A ' A-LINE 102 ' 53198 DJ 33394182
395335 0832203 ' ' 53198 DJ 33394183
I A10B ' A-LINE 102 ' 53198 DJ 3339418

+6  A: 

The appropriate regex would be the ' char followed by any number of any chars [including zero chars] ending with an end of string/line token:

'.*$

And if you wanted to capture everything after the ' char but not include it in the output, you would use:

(?<=').*$

This basically says give me all characters that follow the ' char until the end of the line.

Edit: It has been noted that $ is implicit when using .* and therefore not strictly required, therefore the pattern:

'.*

is technically correct, however it is clearer to be specific and avoid confusion for later code maintenance, hence my use of the $. It is my belief that it is always better to declare explicit behaviour than rely on implicit behaviour in situations where clarity could be questioned.

BenAlabaster
The $ is unnecessary. The dot will stop at the end of the line under normal circumstances.
Tomalak
unnecessary - but proper for what he wants to do. It serves as a reminder later that it is expecting everything from ' to the end of the line
gnarf
@balabaster: I did not say that it was wrong. ;-) It was just a footnote.
Tomalak
@Tomalak: Wasn't trying to imply you were wrong by any means, was just clarifying my reasoning for my choice of using $ rather than not. Thank you for pointing it out.
BenAlabaster
+3  A: 
'.*$

Starting with a single quote ('), match any character (.) zero or more times (*) until the end of the line ($).

CoverosGene
A: 
'.*

I believe you need the option, Multiline.

Joshua Belden
+1  A: 

This will capture everything up to the ' in backreference 1 - and everything after the ' in backreference 2. You may need to escape the apostrophes though depending on language (\')

/^([^']*)'?(.*)$/

Quick modification: if the line doesn't have an ' - backreference 1 should still catch the whole line.

^ - start of string
([^']*) - capture any number of not ' characters
'? - match the ' 0 or 1 time
(.*) - capture any number of characters
$ - end of string
gnarf