tags:

views:

2044

answers:

6

In the upcoming Java7, there is a new API to check if two file object are same file reference.

Are there similar API provided in the .NET framework?

I've search it over MSDN but nothing enlighten me.

I want it simple but I don't want to compare by filename which will cause problems with hard/symbolic links and different style of path. (e.g. \\?\C:\, C:\).

What I going to do is just prevent duplicated file being drag and dropped to my linklist.

+5  A: 

Edit: Note that @Rasmus Faber mentions the GetFileInformationByHandle function in the Win32 api, and this does what you want, check and upvote his answer for more information.


I think you need an OS function to give you the information you want, otherwise it's going to have some false negatives whatever you do.

For instance, does these refer to the same file?

  • \server\share\path\filename.txt
  • \server\d$\temp\path\filename.txt

I would examine how critical it is for you to not have duplicate files in your list, and then just do some best effort.

Having said that, there is a method in the Path class that can do some of the work: Path.GetFullPath, it will at least expand the path to long names, according to the existing structure. Afterwards you just compare the strings. It won't be foolproof though, and won't handle the two links above in my example.

Lasse V. Karlsen
The documentation also says: "Otherwise, this method checks if both FileRefs locate the same file, and depending on the implementation, may require to open or access both files." I am actually very intersted in seeing how this can be done!
Hosam Aly
Using Path.GetFullPath doesn't work, try if (Path.GetFullPath(@"c:\vobp.log") == Path.GetFullPath(@"c:\vobp.log".ToUpper())) {}
tuinstoel
Yeah, notice that, I said *some* of the work, there is no method in .NET that will do it all for you.
Lasse V. Karlsen
A: 

You could always perform an MD5 encode on both and compare the result. Not exactly efficient, but easier than manually comparing the files yourself.

Here is a post on how to MD5 a string in C#.

Soviut
I think this could have a large overhead if he gets a multi-GB file...
Hosam Aly
Why do an MD5? He can simply compare the contents. It would take the same time on positives, and would fail sooner on most negatives.
configurator
Also, this wouldn't be able to tell apart copies of the same file.
configurator
Good points. I was mainly going on the basis that the MD5 results could be cached and compared more easily as new results are dragged in.
Soviut
You may misunderstood my question. I am not asking are content of file same, but are two file path references to same file. If I lock/modify one, another will be affected.
Dennis Cheung
+1  A: 

First I thought it is really easy but this doesn't work:

  string fileName1 = @"c:\vobp.log";
  string fileName2 = @"c:\vobp.log".ToUpper();
  FileInfo fileInfo1 = new FileInfo(fileName1);
  FileInfo fileInfo2 = new FileInfo(fileName2);

  if (!fileInfo1.Exists || !fileInfo2.Exists)
  {
    throw new Exception("one of the files does not exist");
  }

  if (fileInfo1.FullName == fileInfo2.FullName)
  {
    MessageBox.Show("equal"); 
  }

Maybe this library helps http://www.codeplex.com/FileDirectoryPath. I haven't used it myself.

edit: See this example on that site:

  //
  // Path comparison
  //
  filePathAbsolute1 = new FilePathAbsolute(@"C:/Dir1\\File.txt");
  filePathAbsolute2 = new FilePathAbsolute(@"C:\DIR1\FILE.TXT");
  Debug.Assert(filePathAbsolute1.Equals(filePathAbsolute2));
  Debug.Assert(filePathAbsolute1 == filePathAbsolute2);
tuinstoel
Just a guess, that this does not work with links?
BeowulfOF
I don't know, I haven't used it myself.
tuinstoel
A: 

WARNING: HACK

If you really, really want to see if two files are the same and you have write access to them you could always twiddle one of the file attributes (Device is reserved and probably unused) and see if it changes on the other file.

In no way should this be construed as a good solution, however.

Maybe there's something you can do with interop but there doesn't seem to be anything in the .NET library that allows you to check file identity at that level.

Dana Robinson
+13  A: 

As far as I can see (1) (2) (3) (4), the way JDK7 does it, is by calling GetFileInformationByHandle on the files and comparing dwVolumeSerialNumber, nFileIndexHigh and nFileIndexLow.

Per MSDN:

You can compare the VolumeSerialNumber and FileIndex members returned in the BY_HANDLE_FILE_INFORMATION structure to determine if two paths map to the same target; for example, you can compare two file paths and determine if they map to the same directory.

I do not think this function is wrapped by .NET, so you will have to use P/Invoke.

It might or might not work for network files. According to MSDN:

Depending on the underlying network components of the operating system and the type of server connected to, the GetFileInformationByHandle function may fail, return partial information, or full information for the given file.

A quick test shows that it works as expected (same values) with a symbolic link on a Linux system connected using SMB/Samba, but that it cannot detect that a file is the same when accessed using different shares that point to the same file (FileIndex is the same, but VolumeSerialNumber differs).

Rasmus Faber
This definitely looks like the way to go. MSDN says that those three fields uniquely identify a file. You'll need to use the Win32 API to get to them, though.
Dana Robinson
Thanks, I added the MSDN citation.
Rasmus Faber
Does this function work on network files?
Hosam Aly
Apparently it does
Lasse V. Karlsen
I doubt it would return the same result for the same file accessed through different shares though; you should check if \\server\share1\file is the same as \\server\share2\subdirectory\file when the files are really the same.
configurator
Thanks Rasmus for the link to java.nio.file.FileRef (which you gave me in another question). I went after your links here but it wasn't one of them. I think it would be useful here too (maybe as #4).
Hosam Aly
Well, FileRef is just an interface (which WindowsPath implements), so I did not think there would be any interesting things to see in the source code. But if you think it will be worthwhile, I will add it.
Rasmus Faber
Thanks. I didn't know that. Best regards.
Hosam Aly
I found this: http://stackoverflow.com/questions/271398/post-your-extension-goodies-for-c-net-codeplex-com-extensionoverflow?answer=274652#274652 (Haven't tested it myself, but it uses GetFileInformationByHandle)
tuinstoel
+1  A: 

Answer: There is no foolproof way in which you can compare to string base paths to determine if they point to the same file.

The main reason is that seemingly unrelated paths can point to the exact same file do to file system redirections (junctions, symbolic links, etc ...) . For example

"d:\temp\foo.txt" "c:\othertemp\foo.txt"

These paths can potentially point to the same file. This case clearly eliminates any string comparison function as a basis for determining if two paths point to the same file.

The next level is comparing the OS file information. Open the file for two paths and compare the handle information. In windows this can be done with GetFileInformationByHandle. Lucian Wischik did an excellent post on this subject here.

There is still a problem with this approach though. It only works if the user account performing the check is able to open both files for reading. There are numerous items which can prevent a user from opening one or both files. Including but not limited to ...

  • Lack of sufficient permissions to file
  • Lack of sufficient permissions to a directory in the path of the file
  • File system change which occurs between the opening of the first file and the second such as a network disconnection.

When you start looking at all of these problems you begin to understand why Windows does not provide a method to determine if two paths are the same. It's just not an easy/possible question to answer.

JaredPar
The documentation for GetFileInformationByHandle says: "nFileIndexLow: Low-order part of a unique identifier that is associated with a file. This value is useful ONLY WHILE THE FILE IS OPEN by at least one process. If no processes have it open, the index may change the next time the file is opened."
Integer Poet