views:

233

answers:

1

Hello,

I'm on to some SQL where clause parsing and designed a working RegEx to find a column outside string literals using "Rad Software Regular Expression Desginer" which is using the .NET API. To make sure the designed RegEx works with Java too, I tested it by using the API of course (1.5 and 1.6). But guess what, it won't work. I got the message

"Look-behind group does not have an obvious maximum length near index 28".

The string that I'm trying to get parsed is

Column_1='test''the''stuff''all''day''long' AND Column_2='000' AND  TheVeryColumnIWantToFind      =    'Column_1=''test''''the''''stuff''''all''''day''''long'' AND Column_2=''000'' AND  TheVeryColumnIWantToFind   =    ''   TheVeryColumnIWantToFind   =    '' AND (Column_3 is null or Column_3 = ''Not interesting'') AND ''1'' = ''1''' AND (Column_3 is null or Column_3 = 'Still not interesting') AND '1' = '1'

As you may have guessed, I tried to create some kind of worst case to ensure the RegEx won't fail on more complicated SQL where clauses.

The RegEx itself looks like this

(?i:(?<!=\s*'(?:[^']|(?:''))*)((?<=\s*)TheVeryColumnIWantToFind(?=(?:\s+|=))))

I'm not sure if there is a more elegant RegEx (there'll most likely be one), but that's not important right now as it does the trick.

To explain the RegEx in a few words: If it finds the column I'm after, it does a negative look-behind to figure out if the column name is used in a string literal. If so, it won't match. If not, it'll match.

Back to the question. As I mentioned before, it won't work with Java. What will work and result in what I want?
I found out, that Java does not seem to support unlimited look-behinds but still I couldn't get it to work.
Isn't it right that a look-behind is always putting a limit up on itself from the search offset to the current search position? So it would result in something like "position - offset"?

A: 

Hello again,

I finally found a solution and because I asked the question here I'll share it with you of course.

private static final String SQL_STRING_LITERALS_REGEX = "'(?:(?:[^']|(?:''))*)'";
private static final char DOT = '.';

private ArrayList<int[]> getNonStringLiteralRegions(String exclusion) {
    ArrayList<int[]> regions = new ArrayList<int[]>();

    int lastEnd = 0;
    Matcher m = Pattern.compile(SQL_STRING_LITERALS_REGEX).matcher(exclusion);
    while (m.find()) {
        regions.add(new int[] {lastEnd, m.start()});
        lastEnd = m.end();
    }
    if (lastEnd < exclusion.length())
        // We didn't cover the last part of the exclusion yet.
        regions.add(new int[] {lastEnd, exclusion.length()});

    return regions;
}

protected final String getFixedExclusion(String exclusion, String[] columns, String alias) {
    if (alias == null)
        throw new NullPointerException("Alias must not be null.");
    else if (alias.charAt(alias.length() - 1) != DOT)
        alias += DOT;

    StringBuilder b = new StringBuilder(exclusion);
    ArrayList<int[]> regions = getNonStringLiteralRegions(exclusion);
    for (int i = regions.size() - 1; i >= 0; --i) {
        // Reverse iteration to keep valid indices for the lower regions.
        int start = regions.get(i)[0], end = regions.get(i)[1];
        String s = exclusion.substring(start, end);
        for (String column : columns)
            s = s.replaceAll("(?<=^|[\\W&&\\D])(?i:" + column + ")(?=[\\W&&\\D]|$)", alias + column);
        b.replace(start, end, s);
    }

    return b.toString();
}

This time the trick is to simply find any SQL string literals and avoid them when replacing the columns with "Alias.ColumnName". It is important to ensure whole column names when replacing. So if we were to to replace the column "Column_1" in the where clause

WHERE Column_1 = Column_2 AND Column_11 = Column_22

"Column_11" is to be left untouched. (I think it is important to keep that in mind, that's why I mention it here for anyone who faces a similar problem.)
Still, I think this is only a workaround and if you can avoid the need for this logic, it is best to do so.

OK, thanks for the help anyway and I'd be glad to answer upcoming questions to you, if any.

Foo Inc