views:

298

answers:

2

Hello all,

This is my use case: Input is a string representing an Oracle PL/SQL statement of arbitray complexity. We may assume it's a single statement (not a script). Now, several bits of this input string have to be rewritten.

E.g. table names need to be prefixed, aggregate functions in the selection list that don't use a column alias should be assigned a default one:

SELECT SUM(ABS(x.value)), 
TO_CHAR(y.ID,'111,111'),
y.some_col
FROM
tableX x,
(SELECT DISTINCT ID
FROM tableZ z
WHERE ID > 10) y
WHERE
...

becomes

SELECT SUM(ABS(x.value)) COL1, 
TO_CHAR(y.ID,'111,111') COL2,
y.some_col
FROM
pref.tableX x,
(SELECT DISTINCT ID, some_col
FROM pref.tableZ z
WHERE ID > 10) y
WHERE
...

(Disclaimer: just to illustrate the issue, statement does not make sense)

Since aggregate functions might be nested and subSELECTs are a b_tch, I dare not use regular expressions. Well, actually I did and achieved 80% of success, but I do need the remaining 20%.

The right approach, I presume, is to use grammars and parsers. I fiddled around with c++ ANTLR2 (although I do not know much about grammars and parsing with the help of such). I do not see an easy way to get the SQL bits:

list<string> *ssel = theAST.getSubSelectList(); // fantasy land

Could anybody maybe provide some pointers on how "parsing professionals" would pursue this issue? EDIT: I am using Oracle 9i.

+1  A: 

Maybe you can use this, it changes an select statement into an xml block:

declare
    cl clob;
begin
    dbms_lob.createtemporary (
     cl,
     true
    );
    sys.utl_xml.parsequery (
     user,
     'select e.deptno from emp e where deptno = 10',
     cl
    );
    dbms_output.put_line (cl);
    dbms_lob.freetemporary (cl);
end;
/ 

<QUERY>
  <SELECT>
    <SELECT_LIST>
      <SELECT_LIST_ITEM>
        <COLUMN_REF>
          <SCHEMA>MICHAEL</SCHEMA>
          <TABLE>EMP</TABLE>
          <TABLE_ALIAS>E</TABLE_ALIAS>
          <COLUMN_ALIAS>DEPTNO</COLUMN_ALIAS>
          <COLUMN>DEPTNO</COLUMN>
        </COLUMN_REF>
        ....
        ....
        ....
</QUERY>

See here: http://forums.oracle.com/forums/thread.jspa?messageID=3693276&amp;#3693276

Now you 'only' need to parse this xml block.

Edit1:

Sadly I don't fully understand the needs of the OP but I hope this can help (It is another way of asking the 'names' of the columns of for example query select count(*),max(dummy) from dual):

set serveroutput on

DECLARE
 c       NUMBER;
 d       NUMBER;
 col_cnt PLS_INTEGER;
 f       BOOLEAN;
 rec_tab dbms_sql.desc_tab;
 col_num NUMBER;

PROCEDURE print_rec(rec in dbms_sql.desc_rec) IS
BEGIN
  dbms_output.new_line;
  dbms_output.put_line('col_type = ' || rec.col_type);
  dbms_output.put_line('col_maxlen = ' || rec.col_max_len);
  dbms_output.put_line('col_name = ' || rec.col_name);
  dbms_output.put_line('col_name_len = ' || rec.col_name_len);
  dbms_output.put_line('col_schema_name= ' || rec.col_schema_name);
  dbms_output.put_line('col_schema_name_len= ' || rec.col_schema_name_len);
  dbms_output.put_line('col_precision = ' || rec.col_precision);
  dbms_output.put_line('col_scale = ' || rec.col_scale);
  dbms_output.put('col_null_ok = ');

  IF (rec.col_null_ok) THEN
    dbms_output.put_line('True');
  ELSE
    dbms_output.put_line('False');
  END IF;
END;

BEGIN
  c := dbms_sql.open_cursor; 
  dbms_sql.parse(c,'select count(*),max(dummy) from dual ',dbms_sql.NATIVE); 
  dbms_sql.describe_columns(c, col_cnt, rec_tab);

  for i in rec_tab.first..rec_tab.last loop
    print_rec(rec_tab(i));
  end loop;

  dbms_sql.close_cursor(c);
END;
/

(See here for more info: http://www.psoug.org/reference/dbms%5Fsql.html)

The OP also want to be able to change the schema name of the table in a query. I think the easiest say to achieve that is to query the table names from user_tables and search in sql statement for those table names and prefix them or to do a 'alter session set current_schema = ....'.

tuinstoel
Looks nice, but apparently utl_xml.parsequery is only available with 11g. I only have access to 9g - migration to 11g is very unlikely at the moment.
msiemeri
I absoulutely don't understand! You can download 11g and use it to rewrite your queries. After you have rewritten all your queries you can use them in Oracle 9.
tuinstoel
The thing is - it's not a one-time rewrite (I probaly would look into Perl SQL modules for that). This rewrite will take place each and every day with (in general) different queries.
msiemeri
Why do you want to do it? Why is it necessary and why can't it be a one-time rewrite?
tuinstoel
There are new SQLs each day and the prefix might change from day to day, too. There is not something like a fixed state.
msiemeri
Why are there new SQLs each day? Do you have some kind of tool that generates SQLs? What kind of system are we talking about? In an OLTP system with lot of transactions you have to use parameterized queries (binded variables) else Oracle will become slow because it has to do too much parsing. If devs write a new sql statement they can add column aliases.
tuinstoel
Reports written by non-devs. Variable bindings cannot be applied to those reports. Column aliases could be enforced by generating a plan w/o executing the SQL. If an error occurs, the person should edit his or her SQL. Table prefices are another thing. Depending on the machine executing the query, table names have to vary. No pre-processing check here.
msiemeri
Did you take a look at the dbms_sql.describe_columns solution (see above). You could use SYS_CONTEXT('USERENV','HOST') or SYS_CONTEXT('USERENV','TERMINAL') to determine the machine executing the query in an after login trigger and you could use that to set the schema with 'alter session set current_schema = ....' .
tuinstoel
Could get the prodecure working on my Oracle XE, but it does process at work (using TOAD). I don't see a flag that indicates missing column aliases, but I can parse the column name for aggregate functions (in this case, an alias is obviously missing).Thank you very much for your patience and advice!
msiemeri
+1  A: 

If the source of the SQL statement strings are other coders, you could simply insist that the parts that need changing are simply marked by special escape conventions, e.g., write $TABLE instead of the table name, or $TABLEPREFIX where one is needed. Then finding the places that need patching can be accomplished with a substring search and replacement.

If you really have arbitrary SQL strings and cannot get them nicely marked, you need to somehow parse the SQL string as you have observed. The XML solution certainly is one possible way.

Another way is to use a program transformation system. Such a tool can parse a string for a language instance, build ASTs, carry out analysis and transformation on ASTs, and then spit a revised string.

The DMS Software Reengineering Toolkit is such a system. It has PLSQL front end parser. And it can use pattern-directed transformations to accomplish the rewrites you appear to need. For your example involving select items:

domain PLSQL.
rule use_explicit_column(e: expression):select_item -> select_item
   "\e" -> "\e \column\(\e\)".

To read the rule, you need to understand that the stuff inside quote marks represents abstract trees in some computer langauge which we want to manipulate. What the "domain PLSQL" phrase says is, "use the PLSQL parser" to process the quoted string content, which is how it knows. (DMS has lots of langauge parsers to choose from). The terms "expression" and "select_item" are grammatical constructs from the language of interest, e.g., PLSQL in this case. See the railroad diagrams in your PLSQL reference manual. The backslash represents escape/meta information rather than target langauge syntax.

What the rule says is, transform those parsed elements which are select_items that are composed solely of an expression \e, by converting it into a select_item consisting of the same expression \e and the corresponding column ( \column(\e) ) presumably based on position in the select item list for the specific table. You'd have to implement a column function that can determine the corresponding name from the position of the select item. In this example, I've chosen to define the column function to accept the expression of interest as argument; the expression is actually passed as the matched tree, and thus the column function can determine where it is in the select_items list by walking up the abstract syntax tree.

This rule handles just the select items. You'd add more rules to handle the other various cases of interest to you.

What the transformation system does for you is:

  • parse the language fragment of interest
  • build an AST
  • let you pattern match for places of interest (by doing AST pattern matching) but using the surface syntax of the target langauge
  • replace matched patterns by other patterns
  • compute aritrary replacements (as ASTs)
  • regenerate source text from the modified ASTs.

While writing the rules isn't always trivial, it is what is necessary if your problem is stated as posed.

The XML suggested solution is another way to build such ASTs. It doesn't have the nice pattern matching properties although you may be able to get a lot out of XSLT. What I don't know is if the XML has the parse tree in complete detail; the DMS parser does provide this by design as it is needed if you want to do arbitrary analysis and transformation.

Ira Baxter
This is a very interesting remark. And, as far as my limited knowledge of language recognition goes, a "transformation system" can be implemented via ANTLR, too. For now, I will refer to tuinstoel's advice and be satisfied without a complete parser. But the problem did rise my interest for parsing technology, so I might look further into it
msiemeri
ANTLR certainly will let you define a PLSQL grammar and build ASTs; it may even have a useful PLSQL grammar available. It isn't clear that ANTLR will let you write transformation rules like this, although it has some kind of "string template" langauge, used mostly for code generation. ANTLR doesn't have strong support for regenerating the entire AST as source text, and that's harder than it sounds. A different alternative closer to the mark is TXL and Stratego; I'm pretty sure neither of those has even a starting grammar for PLSQL, although you can define one. That's a lot of work, too.
Ira Baxter