tags:

views:

592

answers:

5

I came up with: ([^"]*["][^"]*["][^"]*)*

It works in all cases except against the empty string. I thought it would work because the last star matches the previous token zero or more times.

Any ideas?

Also if there's a much better way of doing this please let me know and explain it in detail.

The solution must be a regex as the place where it will be used is a hook which requires a regex.

It has to match a string without quotes as well, as zero is an even number

+2  A: 

Seems like regexp isn't the right tool for the job. Edit: However, you seem be restricted to it. This doesn't answer your question given that constraint, but will work great without it.

Just iterate over your string and count. C example:

bool hasEvenNumberOfQuotes(const char *str)
{
    bool even = true;

    while(*str != '\0')
    {
        if(*str == '"')
            even = !even;

        ++str;
    }

    return even;
}
strager
Sadly, the place where I'll use the code exposes a hook wich is a specifically a regex that matches or not, so only a regex solution will halp me.
David Reis
@Reis, Not sure why you downvoted when you said "Also if there's a much better way of doing this please let me know and explain it in detail." It answers your question. Please edit your question saying you're restricted to using regexp.
strager
Why is this detail not in the original question? You have explicitly asked if there is a better way of doing it.
Peter Boughton
I think he meant a better *regex*.
Alan Moore
+3  A: 

Try this expression:

^(?:[^"]+|"[^"]*")*$

It matches a sequence that consists of either any character other than quotes ([^"]+) or a pair of quotes with any character other than quoted between ("[^"]*"). And the * quantifier takes the empty string into account.

Gumbo
You forgot to explain it in detail and also why mine did not work. More than quick and dirty fixes I also want to learn something... :-)
David Reis
Yours doesn’t work because it either matches zero characters (group repeated zero times) or more characters that must contain at least two quotes (group repeated once more often). So there is no case that the sequence is just characters other than quotes.
Gumbo
David didn't say anything about matching a string w/no quotes, but he also said his regex doesn't match the empty string, which it obviously does. @David, is this what you were really trying to do?
Alan Moore
a string with no quotes has an even number of quotes (0), so it should be matched.
David Reis
A: 
import re

def hasPairedQuotes(s):
    stripped = re.sub('[^"]', "", s)
    return len(stripped) % 2 == 0

>>> hasPairedQuotes("")
True
>>> hasPairedQuotes('""')
True
>>> hasPairedQuotes('"""')
False
>>> hasPairedQuotes('"Hello world!""')
False
>>> hasPairedQuotes('"Hello world!"')
True


Fine you want a regexp, here's a regexp: ^[^"]*("[^"]*")*[^"]*$... but I think the difference in legibility and maintainability speaks for itself.

>>> re.match(r'^[^"]*("[^"]*"[^"])*$', 'Hello ""')
<_sre.SRE_Match object at 0xb7cc0ce0>
>>> re.match(r'^[^"]*("[^"]*"[^"])*$', 'Hello "" "')
>>>
Aaron Maenpaa
I don't ven know what language is that.. Phyton maybe? Anyhow. It also does not help me.
David Reis
It's Python not phyton.
Aaron Maenpaa
@Reis, PCRE is pretty consistent across languages. Different regexp engines may handle regexp's differently, though, so there may be need for slight modification.
strager
You need to move the leading or trailing [^"]* (but not both of them) back inside the parentheses to catch the non-quotes between the quoted sections.
Alan Moore
+1  A: 

Based off your regexp:

([^"]*["][^"]*["][^"]*)*

Add line anchors:

^([^"]*["][^"]*["][^"]*)*$

Add possibility to match a non-":

^([^"]*["][^"]*["][^"]*|[^"]?)*$

This last step allows nothing to be matched, or a character to be consumed. This permits strings lacking a " to be matched. Note that the line anchors are needed, otherwise substring(s) will be matched to this.

Bonus: Prevent group backreferencing (naming/numbering groups may slow down the regexp engine a tiny bit):

^(?:[^"]*["][^"]*["][^"]*|[^"]?)*$
strager
+4  A: 

Hi,

your regex should match the completely empty string, but not e.g. a string consisting of a single space, because your regex states that if the string is not completely empty, it needs to contain at least one double quote. This is because of the ["] tokens inside the regex which are not followed by *.

The proper way to think about the needed regular expression is as follows: you want to match (string without double quotes) followed by (double quote) plus (string without double quotes) followed by (double) quote followed by (string without double quotes), and then repeat starting from the first 'followed by' ad infinitum. String without double quotes is [^"]*, so you get (whitespace added for readability):

[^"]* (" [^"]* " [^"]*)*

If you compare this with your regular expression, the first [^"]* has been moved out of the repetition.

antti.huima
I do want it to match a string with only spaces. In fact any string without quotes is fine because it will have an even number of quotes (0)
David Reis
Yes. What I meant with my response is that the regex you gave in your original question matches the empty string, but it does not match a non-empty string without any double quotes, and that is the problem with your regex.
antti.huima