views:

844

answers:

8

I have a .csv file that is frequently updated (about 20 to 30 times per minute). I want to insert the newly added lines to a database as soon as they are written to the file.

The FileSystemWatcher class listens to the file system change notifications and can raise an event whenever there is a change in a specified file. The problem is that the FileSystemWatcher cannot determine exactly which lines were added or removed (as far as I know).

One way to read those lines is to save and compare the line count between changes and read the difference between the last and second last change. However, I am looking for a cleaner (perhaps more elegant) solution.

I’d appreciate any opinions/comments/solutions.

A: 

off the top of my head, you could store the last known file size. Check against the file size, and when it changes, open a reader.

Then seek the reader to your last file size, and start reading from there.

expedient
Just because the file size stays the same doesnt mean nothing has changed. A hash would be much more appropriate.. or in this case, using FileSystemWatcher.
Simucal
+1  A: 

I would keep the current text in memory if it is small enough and then use a diff algorithm to check if the new text and previous text changed. This library, http://www.mathertel.de/Diff/, not only will tell you that something changed but what changed as well. So you can then insert the changed data into the db.

James
+2  A: 

Right, the FileSystemWatcher doesn't know anything about your file's contents. It'll tell you if it changed, etc. but not what changed.

Are you only adding to the file? It was a little unclear from the post as to whether lines were added or could also be removed. Assuming they are appended, the solution is pretty straightforward, otherwise you'll be doing some comparisons.

itsmatt
A: 

You're right about the FileSystemWatcher. You can listen for created, modified, deleted, etc. events but you don't get deeper than the file that raised them.

Do you have control over the file itself? You could change the model slightly to use the file like a buffer. Instead of one file, have two. One is the staging, one is the sum of all processed output. Read all lines from your "buffer" file, process them, then insert them into the end of another file that is the total of all lines processed. Then, delete the lines you processed. This way, all info in your file is pending processing. The catch is that if the system is anything other than write (i.e. also deletes lines) then it won't work.

Mike L
+1  A: 

I've written something very similar. I used the FileSystemWatcher to get notifications about changes. I then used a FileStream to read the data (keeping track of my last position within the file and seeking to that before reading the new data). Then I add the read data to a buffer which automatically extracts complete lines and then outputs then to the UI.

Note: "this.MoreData(..) is an event, the listener of which adds to the aforementioned buffer, and handles the complete line extraction.

Note: As has already been mentioned, this will only work if the modifications are always additions to the file. Any deletions will cause problems.

Hope this helps.

   public void File_Changed( object source, FileSystemEventArgs e )
    {
        lock ( this )
        {
            if ( !this.bPaused )
            {
                bool bMoreData = false;

                // Read from current seek position to end of file
                byte[] bytesRead = new byte[this.iMaxBytes];
                FileStream fs = new FileStream( this.strFilename, FileMode.Open, FileAccess.Read, FileShare.ReadWrite );

                if ( 0 == this.iPreviousSeekPos )
                {
                    if ( this.bReadFromStart )
                    {
                        if ( null != this.BeginReadStart )
                        {
                            this.BeginReadStart( null, null );
                        }
                        this.bReadingFromStart = true;
                    }
                    else
                    {
                        if ( fs.Length > this.iMaxBytes )
                        {
                            this.iPreviousSeekPos = fs.Length - this.iMaxBytes;
                        }
                    }
                }

                this.iPreviousSeekPos = (int)fs.Seek( this.iPreviousSeekPos, SeekOrigin.Begin );
                int iNumBytes = fs.Read( bytesRead, 0, this.iMaxBytes );
                this.iPreviousSeekPos += iNumBytes;

                // If we haven't read all the data, then raise another event
                if ( this.iPreviousSeekPos < fs.Length )
                {
                    bMoreData = true;
                }

                fs.Close();

                string strData = this.encoding.GetString( bytesRead );
                this.MoreData( this, strData );

                if ( bMoreData )
                {
                    File_Changed( null, null );
                }
                else
                {
                    if ( this.bReadingFromStart )
                    {
                        this.bReadingFromStart = false;
                        if ( null != this.EndReadStart )
                        {
                            this.EndReadStart( null, null );
                        }
                    }
                }
            }
        }
RichS
+1  A: 

I think you should use NTFS Change Journal or similar:

The change journal is used by NTFS to provide a persistent log of all changes made to files on the volume. For each volume, NTFS uses the change journal to track information about added, deleted, and modified files. The change journal is much more efficient than time stamps or file notifications for determining changes in a given namespace.

You can find a description on TechNet. You will need to use PInvoke in .NET.

artur02
A: 

Be careful. FileSystemWatcher is not really reliable at detecting actual changes to a file. It may fire several times for one update done to the file.

pointernil
A: 

One can use CallbackFilter to track changes (and even modify the written information) in real-time. This way you are notified about the operation right when it happens or (optionally) before it happens.

Eugene Mayevski 'EldoS Corp