views:

366

answers:

1

A while back I needed to parse a bunch of Serve-U FTP log files and store them in a database so people could report on them. I ended up developing a small C# app to do the following:

  1. Look for all files in a dir that have not been loaded into the db (there is a table of previously loaded files).

  2. Open a file and load all the lines into a list.

  3. Loop through that list and use RegEx to identify the kind of row (CONNECT, LOGIN, DISCONNECT, UPLOAD, DOWNLOAD, etc), parse it into a specific kind of object corresponding to the kind of row and add that obj to another List.

  4. Loop through each of the different object lists and write each one to the associated database table.

  5. Record that the file was successfully imported.

  6. Wash, rinse, repeat.

It's ugly but it got the job done for the deadline we had.

The problem is that I'm in a DBA role and I'm not happy with running a compiled app as the solution to this problem. I'd prefer something more open and more DBA-oriented.

I could rewrite this in PowerShell but I'd prefer to develop an SSIS package. I couldn't find a good way to split input based on RegEx within SSIS the first time around and I wasn't familiar enough with SSIS. I'm digging into SSIS more now but still not finding what I need.

Does anybody have any suggestions about how I might approach a rewrite in SSIS?

+1  A: 

I have to do something similar with Exchange logs. I have yet to find an easier solution utilizing an all SSIS solution. Having said that, here is what I do:

First I use logparser from Microsoft and the bulk copy functionality of sql2005

I copy the log files to a directory that I can work with them in.

I created a sql file that will parse the logs. It looks similar to this:

SELECT TO_Timestamp(REPLACE_STR(STRCAT(STRCAT(date,' '), time),' GMT',''),'yyyy-M-d h:m:s') as DateTime, [client-ip], [Client-hostname], [Partner-name], [Server-hostname], [server-IP], [Recipient-Address], [Event-ID], [MSGID], [Priority], [Recipient-Report-Status], [total-bytes], [Number-Recipients], TO_Timestamp(REPLACE_STR([Origination-time], ' GMT',''),'yyyy-M-d h:m:s') as [Origination Time], Encryption, [service-Version], [Linked-MSGID], [Message-Subject], [Sender-Address] INTO '%outfile%' FROM '%infile%' WHERE [Event-ID] IN (1027;1028)

I then run the previous sql with logparser:

logparser.exe file:c:\exchange\info\name_of_file_goes_here.sql?infile=c:\exchange\info\logs\*.log+outfile=c:\exchange\info\logs\name_of_file_goes_here.bcp -i:W3C -o:TSV

Which outputs a bcp file.

Then I bulk copy that bcp file into a premade database table in SQL server with this command:

bcp databasename.dbo.table in c:\exchange\info\logs\name_of_file_goes_here.bcp -c -t"\t" -T -F 2 -S server\instance -U userid -P password

Then I run queries against the table. If you can figure out how to automate this with SSIS, I'd be glad to hear what you did.

GregD
robsymonds