views:

436

answers:

3

Hello, I'm working with the filestream datatype for the first time (SQL Server 2008) and I'm running into issues when I am doing some fast insert/deletes. Basically the speed at which the files are actually removed from the FileSystem is way slower then the insert/delete speed even if I invoke the garbage collector manually (As far as I know the CHECKPOINT is supposed to invoke the garbage collector).

The code below illustrate the problem - It takes roughly 30 seconds to executes, but you have to wait quite a few minutes for the last file to be deleted from the filesystem (When I look up the C:\FSTest\Files folder)

Is there any way to speed up the garbage collector? (It seems to roughly deletes 20 files every 10 seconds - which makes me believe that if I store/delete more then 2 records per second I will eventually end up filling the hard drive)

Thank you

CREATE DATABASE FSTest ON PRIMARY
    (NAME = FSTest_data, FILENAME = N'C:\FSTest\FSTest_data.mdf'),
FILEGROUP FSTestFileGroup CONTAINS FILESTREAM
    (NAME = FSTestFiles,FILENAME = N'C:\FSTest\Files')
LOG ON 
    (NAME = 'FSTest_log', FILENAME = N'C:\FSTest\FSTest_log.ldf');
GO

USE FSTest;
GO

CREATE TABLE FSTest (
    Guid UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(),
    Name VARCHAR (25),
    Data VARBINARY(MAX) FILESTREAM);
GO

ALTER DATABASE FSTest SET RECOVERY SIMPLE;
GO

SET NOCOUNT ON
DECLARE @test int
SET @test=0
WHILE @test<1000 BEGIN
    INSERT INTO FSTest(Name,Data) VALUES('test',CAST('abc' AS VARBINARY(MAX)))
    DELETE FROM FSTest WHERE Name='test'
    CHECKPOINT
    SET @test = @test+1
END

Update:

I tried the same for a longer period of time with a insert/delete speed closer to my needs and after 30 minutes of execution the same situation is observable: Files are created way faster then they get deleted.

SET NOCOUNT ON
DECLARE @test int
SET @test=0
WHILE @test<100000 BEGIN
    INSERT INTO FSTest(Name,Data) VALUES('test',CAST('abc' AS VARBINARY(MAX)))
    DELETE FROM FSTest WHERE Name='test'
    WAITFOR DELAY '00:00:00:200'
    CHECKPOINT
    SET @test = @test+1
END
+1  A: 

Things are a bit more complicated than a simple checkpoint. The file can be removed when the last VLF containing log recordas about the file creation is inactive. See FILESTREAM garbage collection.

Remus Rusanu
Well I actually stumbled upon this article prior to this question and understood that in the "simple" recovery model I only needed to trigger the checkpoint for a garbage collection to occur.That said I know very little about VLF and such and might have missed an important piece of information somewhere.
Benoittr
Remus, the blog article you have linked is not exactly accurate. Garbage collection has nothing to do with VLFs. It just needs a CHECKPOINT - LOG BACKUP - wait a few seconds - CHECKPOINT - LOG BACKUP sequence, which may happen to cause VLF switch as well, but this is not what the GC looks for.
Pawel Marciniak
+2  A: 

As Remus said, if you're using full recovery model then things are complicated. But even under simple recovery model you need to keep in mind that CHECKPOINT invokes the garbage collector (GC), but it doesn't guarantee that GC will delete all the files in a single pass. GC currently has a limit on the number of operations it may carry on in a single invocation. Furthermore, files are deleted with FILE_DELETE_ON_CLOSE option, therefore as long as there are open handles to a file, you will still see it, even though GC may have already deleted it. Such open handles may be held by anti-virus programs or other file system filter drivers.

Lastly, unless you're actually running out of disk space, I wouldn't worry that much about stale files - they will eventually get deleted as part of automatic database checkpoints. I believe (although believe is the keyword here) that even though it may have slow "startup time", GC will keep up with physically deleting files if you run your test steadily for a longer period of time (minutes?).

If you care about performance, keeping filestream containers on a system drive may not be such a good idea. See here for filestream performance tips.

Pawel Marciniak
I guess you're right, I'll give it a try with a more sustained load and see how it react. As for the "limit on the number of operations it may carry on in a single invocation" do we know this limitation, is there a way to raise that limit? I'm just hoping I won't hit a wall in the future.
Benoittr
I tried the sustained scenario and after 40 minutes of runtime the same problem is observable filecount is around 5000 : files get created faster then they are deleted. (I updated the main post with the observations).
Benoittr
+2  A: 

After some more research (and thanks to Paul Randal's blog - lots of very detailed information surrounding filestream and garbage collection), once the rows are deleted and a checkpoint is executed, the files are put in a system table (Tombstone table), then every 10 seconds a process (Ghost Cleanup) runs and remove some items from that table (20 to be exact). So basically we are limited to 2 deletes/seconds and there seems to be no way (yet) to change this behavior.

Since I have a sustained 4 deletes per seconds I will need to find an alternative to filestream.

Thanks everyone for your inputs.

Benoittr