views:

437

answers:

4

I'm reading a file and I either read a row of data (1600 sequential reads of 17 bytes) or a column of data (1600 reads of 17 bytes separated by 1600*17=27,200 bytes). The file is either on a local drive or a remote drive. I do the reads 10 times so I expect in each case to read in 272,000 bytes of data.

On the local drive, I see what I expect. On the remote drive when reading sequentially I also see what I expect but when reading a column, I see a ton of extra reads being done. They are 32,768 bytes long and don't seem to be used but they make the amount of data being read jump from 272,000 bytes to anywhere from 79 MB to 106 MB. Here is the output using Process Monitor:

1:39:39.4624488 PM  DiskSpeedTest.exe 89628 ReadFile \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Offset: 9,390,069, Length: 17
1:39:39.4624639 PM  DiskSpeedTest.exe 89628 FASTIO_CHECK_IF_POSSIBLE \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Operation: Read, Offset: 9,390,069, Length: 17
1:39:39.4624838 PM  DiskSpeedTest.exe 89628 ReadFile \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Offset: 9,388,032, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
1:39:39.4633839 PM  DiskSpeedTest.exe 89628 ReadFile \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Offset: 9,417,269, Length: 17
1:39:39.4634002 PM  DiskSpeedTest.exe 89628 FASTIO_CHECK_IF_POSSIBLE \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Operation: Read, Offset: 9,417,269, Length: 17
1:39:39.4634178 PM  DiskSpeedTest.exe 89628 ReadFile \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Offset: 9,444,469, Length: 17
1:39:39.4634324 PM  DiskSpeedTest.exe 89628 FASTIO_CHECK_IF_POSSIBLE \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Operation: Read, Offset: 9,444,469, Length: 17
1:39:39.4634529 PM  DiskSpeedTest.exe 89628 ReadFile \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Offset: 9,441,280, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
1:39:39.4642199 PM  DiskSpeedTest.exe 89628 ReadFile \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Offset: 9,471,669, Length: 17
1:39:39.4642396 PM  DiskSpeedTest.exe 89628 FASTIO_CHECK_IF_POSSIBLE \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Operation: Read, Offset: 9,471,669, Length: 17
1:39:39.4642582 PM  DiskSpeedTest.exe 89628 ReadFile \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Offset: 9,498,869, Length: 17
1:39:39.4642764 PM  DiskSpeedTest.exe 89628 FASTIO_CHECK_IF_POSSIBLE \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Operation: Read, Offset: 9,498,869, Length: 17
1:39:39.4642922 PM  DiskSpeedTest.exe 89628 ReadFile \\BCCDC01\BCC-raid3\SeisWareInc Temp Dir\BPepers_Temp\Projects\PT_4\Horizons\BaseName3D_1\RR_AP SUCCESS Offset: 9,498,624, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal

Notice the extra reads of 32,768 with I/O Flags set to non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal. These extra reads are what take it from 272 KB to 106 MB and are causing the slowness. They don't happen when reading from a local file or if I'm reading a row so it's all sequential.

I've tried setting the FILE_FLAG_RANDOM_ACCESS but it doesn't seem to help. Any ideas on what is causing these extra reads and how to make them stop???

The tests are being run on a Vista 64 bit system. I can provide source code for a program to demonstrate the problem as well as a console program that does the tests.

+1  A: 

You might be running into op lock issues over smb. Typically when reading/saving a file over the network windows will pull over the full file to the client work on it and send back changes. When you are working with flat file databases or files it can cause unnecessary reads across an smb file share.

I'm not sure if there is a way to just pull over the whole file, read the rows from that file on the local copy and then push back the changes or not.

You'll read some nightmares about oplocks and flat file databases.

http://msdn.microsoft.com/en-us/library/aa365433%28VS.85%29.aspx

Not sure if this solves your problem, but it might get you pointed in the right direction. Good luck!

Jeremy Heslop
A: 

I see this all the time, and it's out of your control: the network does what it wants.

If you know the file is going to be less than 1MB, just pull the whole thing into memory.

egrunin
A: 

My guess is that the OS is doing it's own read-ahead of the file on the off chance you need the data at a later point. If it's not hurting you then it shouldn't matter.

Check out caching behavoir section of the CreateFile API.

You may like to try the 'FILE_FLAG_NO_BUFFERING' to see if it stops the extra reads. Be warned tho, using this flag may slow your application down. Normally you use this flag if you understand how to stream data off the disk as fast as you can and the OS caching is only getting in the way.

Also you may be able to get the same sort of behavior as the network file with local files if you use the 'FILE_FLAG_SEQUENTIAL_SCAN' flag. This flag hint's to the windows cache manager what you will be doing and will try to get the data for you ahead of time.

Shane Powell
A: 

I think SMB always transfers a block, rather than a small set of bytes.

Some information on block size negotiation can be found here. http://support.microsoft.com/kb/q223140

So you are seeing a read to copy the relevant block, followed by the local read(s) of 17 bytes within the block. (If you look at the pattern, there are some pairs of 17 byte reads where two reads fall within the same block).

The fix obviously depends upon the control you have over the application and the size and structure of the database. (e.g. if the database had one column per file, then all the reads would be sequential. If you used a database server, you wouldn't be using SMB, etc.)

If it's any consolation, iTunes performs abysmally when using a network drive too.

MZB