tags:

views:

118

answers:

1

I have the following regular expression that works fine in perl:

Classification:\s([^\n]+?)(?:\sRange:\s([^\n]+?))*(?:\sStructural Integrity:\s([^\n]+))*\n

The type of data format this string is supposed to match against is:

Classification: Class Name     Range: xxxx     Structural Integrity: value
Classification: Class Name    Structural Integrity: value
Classification: Class Name

That is: the "Range" and "Structural Integrity" fields are optional. So the desired result is:

{
$& [Classification: Class Name Range: xxxx Structural Integrity: value ]
$1 [Class Name ]
$2 [xxxx ]
$3 [value ]

$& [Classification: Class Name    Structural Integrity: value ]
$1 [Class Name ]
$2 [value ]

$& [Classification: Class Name ]
$1 [Class Name ]
}

The expression uses the ? lazy quantifier in two places. This operator is not supported by QRegExp, instead Qt uses a "minimal" property which, when set to true, makes all quantifiers in an expression non-greedy

Armed with this information I write my code:

QRegExp rx("Classification:\\s([^\\n]+)(?:\\sRange:\\s([^\\n]+))*(?:\\sStructural Integrity:\\s([^\\n]+))*\\n");
rx.setMinimal(true);

But the results are incorrect, and after much tweaking I haven't been able to get the correct captures. Is it possible to split this up into more code and less regex? Or to rewrite it without the lazy operator?

+2  A: 

Something like this:

QRegExp rx("(Classification|Range|Structural\\s+Integrity):|(\\S+)");
QStringList classification();
QStringList range();
QStringList integrity();

QStringList current = null;

int pos;
while ((pos = rx.indexIn(str, pos)) != -1) {
    if (rx.cap(1) == null) {
        if (current != null) {
            current << rx.cap(2);
        }
    }
    else if ("Classification".equals(rx.cap(1))) {
       current = classification;
    }
    else if ("Range".equals(rx.cap(1))) {
       current = range;
    }
    else if ("Structural Integrity".equals(rx.cap(1))) {
       current = integrity;
    }
    pos += rx.matchedLength();
}

It matches either valid keys followed by a colon or words. If it is a key, change the current list to the corresponding one. Otherwise add the word to the current list.

In the end, you will have the lists classification, range and integrity, containing the words after the corresponding keys. You could join them after the full match is done:

QString classificationString = classification.join(" ");

It does not care about the order of the keys though.

MizardX