views:

133

answers:

4

Hi,

I have multiple files with hundreds of thousands of records in following format:

2010/08/10 10:07:52|TrainZoom|1393|38797|MyCustomerAPI->,mask = ZPTA,TransId = 1281460071578,testing :-> , capture -> : , IMDB = 9113290830, equipmentName = GMT, technologyName = RMS,,,)|

There are fields separated by pipes and inside a field (last) there are many other fields separated by commas. What I want to do is to store all these values in database. For the above example my DB fields would be date-time, process-name, thread-id, sequence-num, API name, transId, imdb, equipmentName, technologyName.
API Name is not in a fixed format but I can assume that its a fixed length.

Please let me know if I can do this in Unix using awk or sed.

Thanks,

+1  A: 

I would use perl or python to do the job as they provide DB connectors for different flavor of DBs. I don't see what you want awk or sed to do, as they are "text to text transformers".

You should complete your question to clarify your goal (text => sql script? for example) and target DB if you want some sample code.

RC
most (if not all) database also provide database clients that can execute sql scripts.
ghostdog74
+1  A: 

You can do the parsing, at least, directly in bash. I'm not sure what you want to do with the values, but this script parses each line and displays the individual fields:

#!/bin/bash

while IFS='|' read dateTime processName threadId sequenceNum other; do
    IFS=',' read apiName mask transId testing capture imdb equipmentName technologyName other <<< "$other"

    echo "date-time:       $dateTime"
    echo "process-name:    $processName"
    echo "thread-id:       $threadId"
    echo "sequence-num:    $sequenceNum"
    echo "api-name:        $apiName"
    echo "trans-id:        $transId"
    echo "imdb:            $imdb"
    echo "equipment-name:  $equipmentName"
    echo "technology-name: $technologyName"
done
John Kugelman
+2  A: 

yes you can do at least the parsing in awk,

awk -F"|" 'BEGIN{q="\047"}
{
 printf "insert into table values("
 printf q $1 q","q $2,","
 # fill up the rest yourself
}
' file  

the above produces a bunch of insert statements. You can either pipe them to your database client, or save them to a .sql file, then use the database client to execute it.

ghostdog74
You can then use the database command line tools to do an insert. E.g. For mySQL it would be "mysql -u username -p database_name < insertScript.sql"
Andrew Dyster
+3  A: 

Just for fun:

cat file.txt | sed -e 's/\([^|]*\)|\([^|]*\)|\([^|]*\)|\([^|]*\)|/insert into table set date-time=\1, process-name=\2, thread-id=\3, sequence-num=\4, /' -e 's/,[^=]*,/,/g' -e 's/ \([,=]\)/\1/g' -e 's/\([,=]\) /\1/g' -e 's/\([^,]*\)=\([^,]*\),/\1="\2",/g' -e 's/"[^"]*$/";/' | mysql dbname
njamesp
Wow, this works like a charm. How much time it took for you to format this command? You are great. Can you also please tell me if I want to generate an statement like this `insert into mytab (date_time,thread_id,seq_num, field1, field2) values ('2010/02/09',201,1001,'Test','Test')` how can modify above command or use another command to format the output of above. Thanks.
raj_arni
+1 since you said "just for fun". Otherwise, I would have downvoted such an evil response :-)
mogsie