views:

199

answers:

5

Hello,

I am trying to find a way of monitoring directories in Perl, in particular the size of a directory, and upon detecting a change in directory size, perform a particular action.
The issue I have is with large files that require a noticeable amount of time to copy into this directory, i.e. > 100MB. What happens (in Windows, not Unix) is the system reserves enough disk space for the entire file, even though the file is still copying in progress. This causes problems for me, because my script will try to perform an action on this file that has not finished copying over. I can easily detect directory size changes in Unix via 'du', but 'du' in Windows does not behave the same way.

Are there any accurate methods of detecting directory size changes in Perl?

Edit: Some points to clarify: - My Perl script is only monitoring a particular directory, and upon detecting a new file or a new directory, perform an action on this new file or directory. It is not copying any files; users on the network will be copying files into the directory I am monitoring. - The problem occurs when a new file or directory appears (copied, not moved) that is significantly large (> 100MB, but usually a couple GB) and my program fires before this copy completes - In Unix I can easily 'du' to see that the file/directory in question is growing in size, and take the appropriate action - In Windows the size is static, so I cannot detect this change - opendir/readdir/closedir is not feasible, as some of the directories that appear may contain thousands of files, and I want to avoid the overhead of

Ideally I would like my program to be triggered on change, but I am not sure how to do this. As of right now it busy waits until it detects a change. The change in file/directory size is not in my control.

+3  A: 

You seem to be working around the underlying issue rather than addressing it -- your program is not properly sending a notification when it is finished copying a file. Why not do that instead of using OS-specific mechanisms to try to indirectly determine when the operation is complete?

Ether
Signal files (or other out-of-band communication method) are the way to go.
Chas. Owens
Sorry for not clarifying. The thing is my program is not doing the copying; it is only monitoring a particular directory for changes in file or directory size. The copying would be done by an external source, i.e. random users on the network.
materiamage
A: 

Evaluating the size of a directory is something all but the most inexperienced Perl programmers should be able to do. You can write your own portable version of du in 15 lines of code if you know about:

  1. Either glob or opendir / readdir / closedir to iterate through the files in a directory
  2. The filetest operators (-f file, -d file, etc.) to distinguish between regular files and directory names
  3. The stat function or file size operator -s file to obtain the size of a file
mobrule
Isn't the issue more of latency than replicating `du`? I think he is stating (not so clearly) that he utility is being called prior to the end of the file being copied since he is polling the size of the directory.
drewk
@drewk Probably. Sometimes it's hard to decide whether to answer the question that should have been asked rather than the question that was asked.
mobrule
yes, agreed....
drewk
Very sorry for not clarifying earlier. I've thought of opendir/readdir/closedir, but the problem is the files/directories I am monitoring my contain thousands of files, and the overhead in opening each is something I am trying to avoid. I'll try to edit my original question to be more specific.
materiamage
@materiamage - You don't need to open them. Use `stat` or `-s <file>` to get the size of a file.
mobrule
This advice does not scale. You need something like Linux::Inotify2 to be sure you don't miss any state changes. glob and readdir are too slow and stat is too inaccurate.
jrockway
+1  A: 

As I understand it, you are polling a directory with thousands of files. When you see a new file, there is an action that is taken on the file. This causes problems if the file is in use or still being copied, correct?

There are potentially several solutions:

1) Use flock to detect if the file is still in use by another process (test if it works properly on your OS, file system, and Perl version).

2) Use a LockFile call on Windows. If it fails, the OS or another process is using that file.

3) Change the poll interval to a non busy time on the server and take the directory off line while your process completes.

drewk
+3  A: 

You can use Linux::Inotify2 or Win32::ChangeNotify to detect directory/file changes.

EDIT: File::ChangeNotify seems a better option (cross-platform & used by Catalyst)

sebthebert
Ah thanks! I think this is just what I was looking for
materiamage
[We have established that `File::ChangeNotify` is best.](http://stackoverflow.com/questions/1776745#1778606)
daxim
@daxim I didn't know that one, thanks ! Answer updated
sebthebert
A: 

There is a nice module called File::Monitor, it will detect new files, deleted files, changes in size and any other attribute that can be done with stat. It will then go and out put the files for you.

http://search.cpan.org/~andya/File-Monitor/lib/File/Monitor.pm

You set up a baseline scan, then set up a call back for each item you are looking for, so new changes you can see via

$monitor->watch( {
    name        => 'somedir',
    recurse     => 1,
    callback    => {
        files_created => sub {
            my ($name, $event, $change) = @_;
            # Do stuff
        }
    }
} );

If you need to go deeper than one level just do it to whatever level you need. After this is done and it finds new files you can trigger you application to do what you want on the files.

Nerdatastic