views:

197

answers:

4

I need to check to see if a string of many words / letters / etc, contains only 1 set of triple double-quotes (i.e. """), but can also contain single double-quotes (") and double double-quotes (""), using a regex. Haven't had much success thus far.

A: 

Depends on your language, but you should only need to match for three double quotes (e.g., /\"{3}/) and then count the matches to see if there is exactly one.

Platinum Azure
I thought about that, I'm just looking to be able to do it in a more concise manner / least amount of code possible.
JF
+1  A: 

Try using the number of occurrences operator to match exactly three double-quotes.

  • \"{3}
  • ["]{3}
  • [\"]{3}

I've quickly checked using http://www.regextester.com/, seems to work fine.

How you correctly compile the regex in your language of choice may vary, though!

Brabster
Thanks, but there will be other words around it / possibility of double double-quotes etc.
JF
ah - ok - that's a little trickier... Should be possible though I think...
Brabster
agreed. I just have no idea how to do it.
JF
Yep, thinking. This is closer... [^"][\"]{3}[^"] (not a d-quote, then 3 d-quotes, then a char that's not a d-quote)
Brabster
hm yeah. A good test string would be:match: (hello """ hello "" hello)no match: (hello """ hello """ hello)
JF
I don't seem to be able to express this in regex-speak, but not sure why. It feels like I should be able to get a regex to match 'three d-quotes, next char not a d-quote (can do that bit) and NOT same pattern again'. Curious and will keep looking, but if you need this I'd say start looking at doing it algorithmically maybe match and remove the first batch of 3 d-quotes, then check for another three.
Brabster
btw if you're using a language that can do lookahead, this should match 'at least one set of 3 d-quotes, not followed by a d-quote' ["]{3}(?=[^"])
Brabster
A: 

There are probably plenty of ways to do this, but a simple one is to merely look for multiple occurrences of triple quotes then invert the regular expression. Here's an example from Perl:

use strict;
use warnings;

my $match = 'hello """ hello "" hello';
my $no_match = 'hello """ hello """ hello';
my $regex = '[\"]{3}.*?[\"]{3}';

if ($match !~ /$regex/) {
    print "Matched as it should!\n";
}
if ($no_match !~ /$regex/) {
    print "You shouldn't see this!\n";
}

Which outputs:

Matched as it should!

Basically, you are telling it to find the thing you DON'T want, then inverting the truth. Hope that makes sense. Can help you convert the example to another language if you need help.

Morinar
Also, to deal with the case of quadruple+ quotes, you'll probably want your regular expression to be modified as suggested by Brabster, i.e. [^"][\"]{3}[^"], thus making the final regex (to invert) "[^"][\"]{3}[^"].*?[^"][\"]{3}[^"]"
Morinar
A: 

A regex with negative lookahead can do it:

(?!.*"{3}.*"{3}).*"{3}.*

I tried it with these lines of java code:

String good = "hello \"\"\" hello \"\" hello ";
String bad = "hello \"\"\" hello \"\"\" hello ";
String regex = "(?!.*\"{3}.*\"{3}).*\"{3}.*";
System.out.println( good.matches( regex ) );
System.out.println( bad.matches( regex ) );

...with output:

true
false
tangens
this is perfect, thanks! I'm assuming you put the negative lookahead portion first so that it makes sure that there aren't 2 instances of triple double-quotes before matching?
JF
Yes, that's exactly the way it works.
tangens