views:

913

answers:

3

How do I access the input data file name from within SQL*Loader control file so that I can insert it into the table along with data from the input file?

Let's say for example I have the following control file:

LOAD DATA

APPEND
INTO TABLE STG_AM02_BA_RAW
WHEN (1:2) = 'DT'
(
     SUBSCRIBER_NO                   POSITION(11:18)CHAR, 
     ACCOUNT_NO                      POSITION(19:32)CHAR, 
     SUBSCRIBER_NAME                 POSITION(33:92)CHAR
)

I want to do something like:

LOAD DATA

APPEND
INTO TABLE STG_AM02_BA_RAW
WHEN (1:2) = 'DT'
(
     SUBSCRIBER_NO                   POSITION(11:18)CHAR, 
     ACCOUNT_NO                      POSITION(19:32)CHAR, 
     SUBSCRIBER_NAME                 POSITION(33:92)CHAR, 
     INPUTFILE                       INPUTFILENAME()CHAR
)

Assume that I don't have access nor permission to edit the shell script that will invoke SQL*Loader with this control file.

+1  A: 

I don't think that there is a way to this in the circumstances you specified, AFAIK there is no way to properly reference the filename in the "data" part.

Couple of ideas for a workaround:

  • Update the newly inserted records with a separate SQL statement. You might be able to build the statement from the batch file that invokes SQL*Loader.
  • Modify the data file to include the filename (again, might be done from the batch file).
  • Have the batch file build the control file to include the filename as a constant, so you could have something like

    INPUTFILE CONSTANT "my_data.dat"

Hope this helps.

IronGoofy
sorry i've already mentioned that i can't modify the shell script that will invoke the SQL*Loader. All I am allowed to do is modifying the control file, yet using `INPUTFILE CONSTANT "my_data.dat"` won't work because the control file is suppose to accept input file name from the command line.
Lukman
+1  A: 

As of 11g, it isn't possible to access the filename directly from the SQL*Loader control file.

You basically have to handle it from your scripting environment.

If you're not able to modify the loading script, perhaps you could add a header record to the datafile?

It looks like you have a record type field in position 1:2 - can you modify the datafile creation to include a filename record type?

For example, a "FN" data type:

FN                ...        inputfile.txt
DT     12345678XXX...XXXXXYYYYYYYYYYYYYYYY
DT     12345678XXX...XXXXXYYYYYYYYYYYYYYYY
DT     12345678XXX...XXXXXYYYYYYYYYYYYYYYY
DT     12345678XXX...XXXXXYYYYYYYYYYYYYYYY
DT     12345678XXX...XXXXXYYYYYYYYYYYYYYYY

Your load script could then change to:

LOAD DATA

APPEND
INTO TABLE STG_AM02_BA_RAW
WHEN (1:2) = 'FN'
(
        INPUTFILE                       POSITION(1:92)CHAR
)
WHEN (1:2) = 'DT'
(
        SUBSCRIBER_NO                   POSITION(11:18)CHAR, 
        ACCOUNT_NO                      POSITION(19:32)CHAR, 
        SUBSCRIBER_NAME                 POSITION(33:92)CHAR
)

All depends if you can update the data file...

For example,

echo "FNinputfile.txt" > header.txt
cat header.txt inputfile.txt > newinputfile.txt

If you need to reference the filename against each data row, you can load the data into multiple staging tables:

LOAD DATA
TRUNCATE INTO TABLE STAGE_FILENAME
WHEN (1:2) = 'FN'
(
        INPUTFILE                       POSITION(1:92)CHAR
)
TRUNCATE INTO TABLE STAGE_DATA
WHEN (1:2) = 'DT'
(
        SUBSCRIBER_NO                   POSITION(11:18)CHAR, 
        ACCOUNT_NO                      POSITION(19:32)CHAR, 
        SUBSCRIBER_NAME                 POSITION(33:92)CHAR
)

... and join them together using SQL:

insert into STG_AM02_BA_RAW
    (
    subscriber_no,
    account_no,
    subscriber_name,
    input_filename
    )
select
    d.subscriber_no,
    d.account_no,
    d.subscriber_name,
    f.inputfile
from
    stage_data d,
    inputfile d

This process falls over if you have concurrent loads.

You said in the comments that you can change the data file - could you get the file changed to that the filename is appended to each record? If so, makes the issue go away. You'd just have to include:

    SUBSCRIBER_NAME                 POSITION(92:*)CHAR
Nick Pierpoint
yup i think i can modify the input file. thanks! i didn't think of that :P
Lukman
owh wait, before i tick your answer, how do i include the filename into every insert into the table? since the sql*loader is processing the line one by one, is it possible to do back-referencing to previous line?
Lukman
Can you load the data into two staging tables then combine them into the final load table? I'll include an example in my answer.
Nick Pierpoint
thanks for the example. the problem is that there will be millions record in one file and i will receive at least 4 files from external systems to be processed one after another. so it would not be efficient to append the file name to each record and i can't ask the guys on the external systems to include it in the first place. also, i can't do the "load into multiple staging tables and join them together" trick because i have 4 files with 4 different names to be loaded one after another (i.e. without cleaning the staging table first). how do i make sure the join maps the records correctly?
Lukman
nvm I figured it out myself. I insert the filename from the 'FN' line into a separate table and create a `BEFORE INSERT` trigger that fetches the filename and include it into the inserted rows. What a roundabout way of doing it but can't help it with the limitation in SQL*Loader (well, it's called 'control file' and not 'script' for a reason).
Lukman
A: 

The easy way to tackle this issue is by adding an extra column to the end of the each record with the filename and map that column position to the field.

Nash
You could do that easily in unix env
Nash