tags:

views:

148

answers:

4

Here is what I am looking for:

I need to open a zip file of images and iterate through it's contents. First of all, the zip container file has subdirectories and inside one "IDX" houses the images I need. I have no problem extracting the zip file contents to a directory. My zip files can be incredibly huge, as in GBs huge, and so I am hoping to be able to open the file and pull out the images as I iterate through them one at a time to process them.

After I am done I just close the zip file. These images are actually being housed in a database.

Does anyone have any idea how to do this with, hopefully, free tools or built-in api's? This process will be done on a Windows machine.

Thanks!

+6  A: 

SharpZipLib is a great tool for your requirements.

I have used it to process giant files within directories within giant nested zip files (meaning ZIP files within ZIP files), using streams. I was able to open a zip stream on top of a zip stream so that I could investigate the contents of the inner zip without having to extract the entire parent. You can then use a stream to peek at the content files, which may help you determine whether you want to extract it or not. It's open-source.

EDIT: Directory handling in the library is not ideal. As I recall, it contains separate entries for some directories, while others are implied by the paths of the file entries.

Here's an extract of the code I used to collect the actual file and folder names at a certain level (_startPath). Let me know if you're interested in the whole wrapper class.

// _zipFile = your ZipFile instance
List<string> _folderNames = new List<string>();
List<string> _fileNames = nwe List<string>();
string _startPath = "";
const string PATH_SEPARATOR = "/";

foreach ( ZipEntry entry in _zipFile )
{
    string name = entry.Name;

    if ( _startPath != "" )
    {
        if ( name.StartsWith( _startPath + PATH_SEPARATOR ) )
            name = name.Substring( _startPath.Length + 1 );
        else
            continue;
    }

    // Ignore items below this folder
    if ( name.IndexOf( PATH_SEPARATOR ) != name.LastIndexOf( PATH_SEPARATOR ) )
        continue;

    string thisPath = null;
    string thisFile = null;

    if ( entry.IsDirectory ) {
        thisPath = name.TrimEnd( PATH_SEPARATOR.ToCharArray() );
    }
    else if ( entry.IsFile )
    {
        if ( name.Contains( PATH_SEPARATOR ) )
            thisPath = name.Substring( 0, name.IndexOf( PATH_SEPARATOR ) );
        else
            thisFile = name;
    }

    if ( !string.IsNullOrEmpty( thisPath ) && !_folderNames.Contains( thisPath ) )
        _folderNames.Add( thisPath );

    if ( !string.IsNullOrEmpty( thisFile ) && !_fileNames.Contains( thisFile ) )
        _fileNames.Add( thisFile );
}
harpo
I am trying this out, again, but a ZipEntry does not have a list of entries to iterate through if IsDirectory. Documentation is limited at best for this type of functionality.
Tacoman667
Thanks but this is not what I want at all. I want to be able to iterate zipFile.GetEntity("myFolder/").Entities in memory, basically.Do you know if this is possibly in SharpZipLib?
Tacoman667
Maybe there's a misunderstanding, but that's what I was doing with the above code. In that case, _startPath would be "myFolder", and after the code runs, then _folderNames and _fileNames will contain the names of the entities in that folder. You can then use these lists to iterate the entities in memory (using the zipfile's methods to get entries by name).
harpo
So my understanding is that GetEntity() actually iterates all subdirectories as long as I have the zippedFileName?EDIT: Nope. FindEntry() and GetEntry() do not recursively search subdirectories...
Tacoman667
+3  A: 

There are at least two more viable options besides SharpZipLib (which works fine):

  • DotNetZip on Codeplex

  • Microsoft seems to be investigating integrating ZIP functionality into the System.IO namespace - see this blog post for more info

marc_s
A: 

.NET doesn't provide a way to read the contents of a standard ZIP file. The System.IO.Packaging.ZipPackage class can create and read zip files that include a special manifest. ZipPackage can't read files that do not include this file although zip utilities can easily read a .zip created by ZipPackage. If you are the one creating the zips, ZipPackage may be an option. The classes used to perform the actual compression and creation of the .zip file are internal to System.IO.Packaging so you can't use it directly.

To convince your people that there is no OOTB way to open standard zips, you should mention that .NET also provides the System.IO.Compression.GZipStream class which only (de)compresses the contents of a file stream. It does not interpret them to separate files, directories etc.

Jon Galloway covered all the options a while back in "Creating Zip archives in .NET (without an external library)", although no option as clean as the upcoming System.IO.Zip.

Panagiotis Kanavos
A: 

I decided to go with just extracting the contents, iterating through them, and then deleting the files after I am done. This seems to be the only way to do what I need done given the limitations I am bound to.

Thanks all for the great suggestions!

Tacoman667