views:

166

answers:

4

Hi guys.

I'm working on a small piece of ksh code for a simple task. I need to retrieve about 14 millions lines from a table and then generate a xml file using this informations. I don't have any treatement on the information, only some "IF". The problem is that for writing the file it takes about 30 minutes, and it is not acceptable for me.

This is a piece o code:

......
query="select field1||','||field2||' from table1"
ctl_data=`sqlplus -L -s $ORA_CONNECT @$REQUEST`

for variable in  ${ctl_data}
do

var1=echo ${variable} | awk -F, '{ print $1 }'

var2=echo ${variable} | awk -F, '{ print $2 }'

            ....... write into the file ......

done

For speed up the things I'm writing only 30 lines into the file, so more stuff on one line, so I have only 30 acces to the file. It is still long, so is not the writing but looping through the results.

Anyone have a ideea about how to improve it ?

Thanks.

C.C.

A: 

you can lessen the amount of calls to awk using just one instance. eg

query="select codtit||','||crsspt||' from table1"
.....
sqlplus -L -s $ORA_CONNECT @$REQUEST | awk -F"," 'BEGIN{
   print "xml headers here..."
}
{
   # generate xml here..
   print "<tag1>variable 1 is "$1"</tag1>"
   print "<tag2>variable 2 is "$2" and so on..</tag2>"
   if ( some condition here is true ){
      print "do something here"
   }
}'  

redirect the above to a new file as necessary using > or >>

ghostdog74
not bad this solution but I will have some IF inside, so if I'm not wrong I cannot put it inside awk.
CC
sure you can. awk is a programming language, so it has loops and flow control as well. see my edit.
ghostdog74
A: 

I doubt, that this is the most efficient way of dumping data to an xml file. You could try to use groovy for such a task. Take a look at the groovy cookbook at -> http://groovy.codehaus.org/Convert+SQL+Result+To+XML

Oliver Michels
+1  A: 

Rather than pass from oracle to ksh could you do it all in oracle? You can use the following to format your output as xml.

select xmlgen.getxml('select field1,field2 from table1') from dual;
Todd Pierce
A: 

You may be able to eliminate the calls to awk:

saveIFS="$IFS"
IFS=,
array=($variable)
IFS="$saveIFS"
var1=${array[0]} # or just use the array's elements in place of var1 and var2
var2=${array[1]}
Dennis Williamson
But is this working for large amount of data? something like 14 millions lines ?
CC
It would work as well as using awk the way you are now, but would save the cost of spawning an extra process repeatedly. Those six lines would replace the two assignments in the loop you show in your question. If `ctl_data` receives all 14 million lines as its value, then that's where your problem is. If so, then one of the other solutions might be better.
Dennis Williamson