views:

250

answers:

5

My web application returns a file from the filesystem. These files are dynamic, so I have no way to know the names o how many of them will there be. When this file doesn't exist, the application creates it from the database. I want to avoid that two different threads recreate the same file at the same time, or that a thread try to return the file while other thread is creating it.

Also, I don't want to get a lock over a element that is common for all the files. Therefore I should lock the file just when I'm creating it.

So I want to lock a file till its recreation is complete, if other thread try to access it ... it will have to wait the file be unlocked.

I've been reading about FileStream.Lock, but I have to know the file length and it won't prevent that other thread try to read the file, so it doesn't work for my particular case.

I've been reading also about FileShare.None, but it will throw an exception (which exception type?) if other thread/process try to access the file... so I should develop a "try again while is faulting" because I'd like to avoid the exception generation ... and I don't like too much that approach, although maybe there is not a better way.

The approach with FileShare.None would be this more or less:

    static void Main(string[] args)
    {
        new Thread(new ThreadStart(WriteFile)).Start();
        Thread.Sleep(1000);
        new Thread(new ThreadStart(ReadFile)).Start();

        Console.ReadKey(true);
    }

    static void WriteFile()
    {
        using (FileStream fs = new FileStream("lala.txt", FileMode.Create, FileAccess.Write, FileShare.None))
        using (StreamWriter sw = new StreamWriter(fs))
        {
            Thread.Sleep(3000);
            sw.WriteLine("trolololoooooooooo lolololo");
        }
    }

    static void ReadFile()
    {
        Boolean readed = false;
        Int32 maxTries = 5;

        while (!readed && maxTries > 0)
        {
            try
            {
                Console.WriteLine("Reading...");
                using (FileStream fs = new FileStream("lala.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
                using (StreamReader sr = new StreamReader(fs))
                {
                    while (!sr.EndOfStream)
                        Console.WriteLine(sr.ReadToEnd());
                }
                readed = true;
                Console.WriteLine("Readed");
            }
            catch (IOException)
            {
                Console.WriteLine("Fail: " + maxTries.ToString());
                maxTries--;
                Thread.Sleep(1000);
            }
        }
    }

But I don't like the fact that I have to catch exceptions, try serveral times and wait an inaccurate amount of time :|

+1  A: 

i think that a right aproach would be the following: create a set of string were u will save the current file name so one thread would process the file at time, something like this

//somewhere on your code or put on a singleton
static  System.Collections.Generic.HashSet<String> filesAlreadyProcessed= new  System.Collections.Generic.HashSet<String>();


//thread main method code
bool filealreadyprocessed = false
lock(filesAlreadyProcessed){
  if(set.Contains(filename)){
    filealreadyprocessed= true;
  }
  else{
     set.Add(filename)
  }
}
if(!filealreadyprocessed){
//ProcessFile
}
Sebastian Marcet
That's the problem, that I don't want to lock a common element for all the files. First, get a lock is expensive, and I don't wanna get a lock for every call asking for a file, regardless of if the file already exists or not. Second I don't want to block the threads that are trying to get a different file because I'm creating one of them.For these reasons I want to lock the file itself.Cheers.
vtortola
Have you measured the time for acquiring a lock and blocking until completion vs. the time for the thread waking up, checking for access, getting an exception, sleeping, and repeating several times? I expect a lock strategy will be more desirable here. `Thread.Sleep` is less desirable to blocking on a lock. What if the write thread finishes early? The read thread does not wake up. You might want to consider a `ManualResetEvent` to control access between the two threads.
Paul Williams
+1  A: 

You can handle this by using the FileMode.CreateNew argument to the stream constructor. One of the threads is going to lose and find out that the file was already created a microsecond earlier by another thread. And will get an IOException.

It will then need to spin, waiting for the file to be fully created. Which you enforce with FileShare.None. Catching exceptions here doesn't matter, it is spinning anyway. There's no other workaround for it anyway unless you P/Invoke.

Hans Passant
It seems like you're right, there is not other workaround for this.Thanks!
vtortola
+1  A: 

Do you have a way to identify what files are being created?

Say every one of those files corresponds to a unique ID in your database. You create a centralised location (Singleton?), where these IDs can be associated with something lockable (Dictionary). A thread that needs to read/write to one of those files does the following:

//Request access
ReaderWriterLockSlim fileLock = null;
bool needCreate = false;
lock(Coordination.Instance)
{
    if(Coordination.Instance.ContainsKey(theId))
    {
        fileLock = Coordination.Instance[theId];
    }
    else if(!fileExists(theId)) //check if the file exists at this moment
    {
        Coordination.Instance[theId] = fileLock = new ReaderWriterLockSlim();
        fileLock.EnterWriteLock(); //give no other thread the chance to get into write mode
        needCreate = true;
    }
    else
    {
        //The file exists, and whoever created it, is done with writing. No need to synchronize in this case.
    }
}

if(needCreate)
{
    createFile(theId); //Writes the file from the database
    lock(Coordination.Instance)
        Coordination.Instance.Remove[theId];
    fileLock.ExitWriteLock();
    fileLock = null;
}

if(fileLock != null)
    fileLock.EnterReadLock();

//read your data from the file

if(fileLock != null)
   fileLock.ExitReadLock();

Of course, threads that don't follow this exact locking protocol will have access to the file.

Now, locking over a Singleton object is certainly not ideal, but if your application needs global synchronization then this is a way to achieve it.

SealedSun
Same issue that the @hworangdo code, in each request you have to acquire a lock, even when you don't need it.
vtortola
@vtortola: Yep. In defense of my answer: Getting a lock is not expensive, (measure it, its really nothing, especially compared to file IO) but waiting for another thread to release the lock is. You could try to find a lock-free dictionary implementation. You'd only need to be careful in the case that the file needs to be created, so that only one thread gets tasked with creating it.
SealedSun
You're probably right, I never tested how expesive is adquiring a lock myself, I know it because I read it. Waiting for another thread to release the lock is more expensive for sure, but it will happen only once per file. I'll test your approach later on, maybe yours is faster. Thanks!
vtortola
A: 

Your question really got me thinking.

Instead of having every thread responsible for file access and having them block, what if you used a queue of files that need to be persisted and have a single background worker thread dequeue and persist?

While the background worker is cranking away, you can have the web application threads return the db values until the file does actually exist.

I've posted a very simple example of this on GitHub.

Feel free to give it a shot and let me know what you think.

FYI, if you don't have git, you can use svn to pull it http://svn.github.com/statianzo/MultiThreadFileAccessWebApp

statenjason
A: 

Why aren't you just using the database - e.g. if you have a way to associate a filename with the data from the db it contains, just add some information to the db that specifies whether a file exists with that information currently and when it was created, how stale the information in the file is etc. When a thread needs some information, it checks the db to see if that file exists and if not, it writes out a row to the table saying it's creating the file. When it's done it updates that row with a boolean saying the file is ready to be used by others.

the nice thing about this approach - all your information is in 1 place - so you can do nice error recovery - e.g. if the thread creating the file dies badly for some reason, another thread can come along and decide to rewrite the file because the creation time is too old. You can also create simple batch cleanup processes and get accurate data on how frequently certain data is being used for a file, how often information is updated (by looking at the creation times etc). Also, you avoid having to do many many disk seeks across your filesystem as different threads look for different files all over the place - especially if you decide to have multiple front-end machines seeking across a common disk.

The tricky thing - you'll have to make sure your db supports row-level locking on the table that threads write to when they create files because otherwise the table itself may be locked which could make this unacceptably slow.

Clive Saha