ansaurus

Question

Answer 1

+5 A:

An obvious improvement will be with VS2010 where you can have optional parameters.

Another thing that could be helpful is to offer extension methods, so that I could do: input_folder_path.CompressFolder(output_file_path).DecompressFolder(outputfile);

This would allow me to compress, and then decompress what was compressed in order to verify the compression.

What if I want to compress a folder and have it put in the same level as the input file path, why should I have to specify the output file?

So, if I do CompressFolder(@"C:\input_folder") and leave it like that then it would use C: as the output path.

James Black 2009-08-24 04:17:58

I like the idea of the API returning a path. When you compress a file or folder.

ChaosPandion 2009-08-25 00:49:17

Please explain why optional parameters would be 'an obvious improvement'. I don't feel that out-of-order arguments are really improving this API. Also, compressing c:\input_folder automagically to c:\input_folder.zip is, if anything, just another override method.

Robert Paulson 2009-08-25 03:25:50

If I call CompressString("some input string") then the format and level will have a default value, as I expect that everything leads to the Compress method with 3 parameters. If I have optional arguments then I can put those default values in the parameter list and reduce the number of functions to one, and yet have the advantage of overloading in that the programmer only as to pass in one parameter. The reason for assuming the output directory is just a simplification as the original request was what could make the API better.

James Black 2009-08-25 03:35:20

You can do that today without optional parameters. See my answer.

Robert Paulson 2009-08-25 03:55:03

Answer 2

+2 A:

Another improvident not in .net 4 could be to create a class CompressInfo that have String input_folder_path, String output_file_path, Format format, Level level properties in it, and to have just one method that will check is properties null or not.

ArsenMkrt 2009-08-24 04:23:55

I would take overloads over a single monolithic method that has to check for nulls any day. Overloads are clean and simple, and don't require that you create some type before hand to pass into the call.

jrista 2009-08-24 04:36:25

Why ? the method with the most parameters will *still* have to look for nulls in the other parameters ?

sirrocco 2009-08-24 15:02:13

yea, but he will provide only 2 methods to end user in API

ArsenMkrt 2009-08-24 15:06:28

just for the record I was responding to **jrista** :)

sirrocco 2009-08-24 18:22:23

I agree with jrista on this. I'll be sticking with overloads... Until C# 4 comes out of course.

ChaosPandion 2009-08-25 00:50:34

Answer 3

+11 A:

I would recommend breaking this single class into several classes. Generally speaking, static utility classes break a lot of rules, not the least of which is Separation of Concerns. While yes, all of the methods in this class deal with compression, they are concerned with compressing different things. Some compress byte arrays, some compress strings, some compress files. I would break this single utility into multiple utilities:

public static class ByteCompression
{
    public static Byte[] Compress(Byte[] input);
    public static Byte[] Compress(Byte[] input, Format format);
    public static Byte[] Compress(Byte[] input, Format format, Level level);

    public static Byte[] Decompress(Byte[] input);
    public static Byte[] Decompress(Byte[] input, Format format);
}

public static class StringCompression

    public static String Compress(String input);
    public static String Compress(String input, Format format);
    public static String Compress(String input, Format format, Level level);

    public static String Decompress(String input);
    public static String Decompress(String input, Format format);
}

public static class FileCompression
{
    public static void Compress(String input_file_path, String output_file_path);
    public static void Compress(String input_file_path, String output_file_path, Format format);
    public static void Compress(String input_file_path, String output_file_path, Format format, Level level);

    public static void Decompress(String input_file_path, String output_file_path);
    public static void Decompress(String input_file_path, String output_file_path, Format format);
}

public static FolderCompression
{
    public static void Compress(String input_folder_path, String output_file_path);
    public static void Compress(String input_folder_path, String output_file_path, Format format);
    public static void Compress(String input_folder_path, String output_file_path, Format format, Level level);

    public static void Decompress(String input_file_path, String output_file_path);
    public static void Decompress(String input_file_path, String output_file_path, Format format);
}

The above utility classes reduce repetition, better encapsulate purpose, are more cohesive with their member methods, and are clearer in intent. You do have four static utility types rather than one, but you aren't breaking as many rules/best practices this way. Try to avoid monolithic, do-everything utility classes. If you can, find a way to make them instance classes rather than static classes, especially if there is any shared data at the class level that is used across each compress/decompress method. That will improve thread safety.

EDIT:

A more ideal implementation would use extension methods, as andy commented. The File and Folder compression are a bit more difficult to implement as extensions, but I've tried my hand. The following examples better achieve what I was aiming for: separation of noun (or subject) from verb (or operation), providing a cleaner API that ultimately has less repetition, maintains separation of concerns, and is properly encapsulated.

public static class ByteCompressionExtensions
{
    public static byte[] Compress(this byte[] input);
    public static byte[] Compress(this byte[] input, Format format);
    public static byte[] Compress(this byte[] input, Format format, Level level);

    public static byte[] Decompress(this byte[] input);
    public static byte[] Decompress(this byte[] input, Format format);
}

// In use:
byte[] myArray = new byte[] { ... };
byte[] compArray = myArray.Compress();
// Subject (noun) -----^      ^----- Operation (verb)


public static class StringCompressionExtensions
{
    public static byte[] Compress(this string input);
    public static byte[] Compress(this string input, Format format);
    public static byte[] Compress(this string input, Format format, Level level);

    // Extension method fail!! :( :( This conflicts with Decompress from the class above!
    public static string Decompress(this byte[] input);
    public static string Decompress(this byte[] input, Format format);
}

// In use:
string myStr = "A string!";
byte[] compArray = myStr.Compress();
// Subject (noun) ---^      ^----- Operation (verb)
myStr = compArray.Decompress(); // Fail! :(


public static class FileCompressionExtensions
{
    public static void Compress(this FileInfo input, FileInfo output);
    public static void Compress(this FileInfo input, FileInfo output, Format format);
    public static void Compress(this FileInfo input, FileInfo output, Format format, Level level);

    public static void Decompress(this FileInfo input, FileInfo output);
    public static void Decompress(this FileInfo input, FileInfo output, Format format);
}

// In use:
FileInfo myFile = new FileInfo(input_file_path);
FileInfo myCompFile = new FileInfo(output_file_path);
                 myFile.Compress(myCompFile);
// Subject (noun) --^      ^----- Operation (verb)
                 myCompFile.Decompress(myFile);


public static class FolderCompressionExtensions
{
    public static void Compress(this DirectoryInfo input, DirectoryInfo output);
    public static void Compress(this DirectoryInfo input, DirectoryInfo output, Format format);
    public static void Compress(this DirectoryInfo input, DirectoryInfo output, Format format, Level level);

    public static void Decompress(this DirectoryInfo input, DirectoryInfo output);
    public static void Decompress(this DirectoryInfo input, DirectoryInfo output, Format format);
}

// In use:
DirectoryInfo myDir = new DirectoryInfo(input_folder_path);
DirectoryInfo myCompDir = new DirectoryInfo(output_folder_path);
                 myDir.Compress(myCompDir);
// Subject (noun) --^      ^----- Operation (verb)
                 myCompDir.Decompress(myDir);

jrista 2009-08-24 04:28:07

if you're going to break it up by Types, I would drop the API altogether and just implement all the functionality via Extension methods, so SomeString.Compress(), SomeBytes.Compress() etc.

andy 2009-08-24 04:37:34

I would combine this answer with the one above. Definitely start by splitting off related functionality though - 20,000 lines in one file is way too much.

Jamie Penney 2009-08-24 04:37:48

Extension methods are an option if you are using the right version of C#. However, given the nature of the file and folder compression, that would break what would otherwise be a clean API, as there is no decent, logical way to implement those two as extension methods.

jrista 2009-08-24 04:39:08

An additional step you might take is to add the type of compression (if it is a well known one) to the names. For example: CompressZip and DecompressZip.

Chris Dunaway 2009-08-24 14:47:12

I don't understand why breaking it into 4 classes would be an improvement. This just means you have to remember 4 class names instead of one. Internal to this class we have many sub-groupings.For Example: ZipCompressor, GZipCompressor

ChaosPandion 2009-08-25 00:47:56

@ChaosPandion: I recommend looking into the principals of 'Separation of Concern' and 'Single Responsibility', as a general starting point. Just because you stick the "static" keyword on a class doesn't mean you should shirk good coding practices. And all that aside...20,000 lines of code in a single file? That alone is enough reason to break it up. O_o

jrista 2009-08-25 01:29:54

Technically, if we follow OO rules we need need 2 classes for each type. A compress class and a decompress class. I just think it gets out of hand.On a side note it is not actually 20,000 lines we use SharpZipLib as a backing.

ChaosPandion 2009-08-25 03:09:25

@ChaosPandion: I think your still missunderstanding the point I was trying to make. Your classifying the concern as "compression". Thats a very broad concern, and you could lump a LOT of functionality into a single static class under that label. Try to look at it from a finer grain: Byte array compression, string compression, file and folder compression. A lot of architecture boils down to figuring out how to classify things. You can't be too broad in your classifications, but, as you noted, you can't be too fine grained either. Find the right balance, and you'll reap the benefits.

jrista 2009-08-25 03:16:41

On another note, you mentioned that you had specific types of compression: Zip, GZip, perhaps others such as Rar, 7Zip, etc. Those are an entirely different concern as well, and the actual facility that performs compression should also be separated from the concern of compressing specific kinds of things. The 4 static classes above should delegate the actual task of compressing to the appropriate compression instance class, which would probably best be based on a common abstract base compression class. Your flexibility and should improve, while still maintaining SoC.

jrista 2009-08-25 03:21:06

I'm not sure this answer DOES reduce repetition. It's still got the same amount of stuff, just with extra class declarations.

kibibu 2009-08-25 03:36:08

@kibibu: Count the difference in the number of times you encounter "Bytes", "String", "File" and "Folder" in both implementations. There isn't really any way to get rid of the terms "Compress" and "Decompress", although you could try to remove them from the class names (i.e. "ByteUtilty.Compress" rather than "ByteCompression.Compress"). The general idea is to separate noun from verb in the operation names, and put the noun somewhere more appropriate. That is the beauty of extension methods: you can eliminate class in favor of the noun itself, but in this case, you loose API consistency. :(

jrista 2009-08-25 03:50:28

@jrista: Although I don't totally agree with you. You have put up some good advice.

ChaosPandion 2009-08-25 04:54:25

@Chaos: I wouldn't say its so much about "agreeing"...as it is getting something out of it. ;) If I've given you anything that can help you improve your architecture, then its a win.

jrista 2009-08-25 04:55:47

how does this reduce repetition? You still have as many implementations of the same function.

jalf 2009-08-25 11:06:54

@jalf: Check out the DRY (Don't Repeat Yourself) principal on wikipedia. Sure, there are multiple "Compress" and "Decompress" methods, but thats not the point. The point is to separate noun from verb so that you don't have your nouns repeated all over the place. It also aims to separate code concerns (i.e. the concern of compressing byte arrays vs compressing strings.)

jrista 2009-08-25 15:31:28

Answer 4

+1 A:

Following on from jrista, I would inherit them, because I can assume there is some common functionality:

abstract class CompressorBase<T> { }

And then consider having a standard methods of the form:

public CompressionResult Compress (T toCompress, CompressionParams paramaters)
{
}

Then, at least, the class itself is making clear decisions about what to do, simply based on changes to the 'CompressionParams' class.

It's quite nice because you don't need to go around changing the public API anymore, just make changes to that class and the 'Compressor' will figure the rest out.

Noon Silk 2009-08-24 04:44:32

That makes CompressionParams a part of the API, and user would have to configure it properly for different types of T... Less functions, but this exposes too much to the outside IMO.

MaxVT 2009-08-24 05:29:59

Yes, but the caller is already passing in a lot of parameters, and it will greatly simplify your API. So your instatiation is something like ByteCompressor<byte[]> b = new ByteCompressor<byte[]>(); b.Compress(myBytes, params);. It's slightly redundant on the generics, but that's just due to the lack of support to that style of inheritance in c#. I personally think it's quite beautiful :)

Noon Silk 2009-08-24 05:37:24

Following an OO paradigm would overly complicate this. Our goal was to make it as simple as possible.

ChaosPandion 2009-08-25 00:52:27

I don't see it overly complicating it at all, but perhaps it comes down to preference. Does it go without saying that you would subclass CompressionParams to have the appropriate variables for each subclass? It just seems so nice, to have it all wrapped up and OO, in this fashion. Regardless though, up to you of course :) I'm close to writing up all the classes though, just to see how it looks.

Noon Silk 2009-08-25 00:54:18

Answer 5

A:

For the compress methods consider returning information like compression achieved. Similarly for decompression a status indication if any file/folder did not decompress for some reason.

In case of compression/decompression there are usually valid cases when the method should not succeed. For example trying to compress a file that is in use or decompress location can potentially overwrite something. Because these are not 'exceptions' you should not throw exceptions in such cases and returning the information either as a return value or 'out' parameter is recommended.

Sesh 2009-08-24 06:05:45

I disagree. These are exceptions that the caller should handle.

ChaosPandion 2009-08-25 00:53:30

Trying to compress a file that is in use is exactly the type of _exceptional_ circumstance that _exception handling_ is for.

kibibu 2009-08-25 03:15:46

Answer 6

+2 A:

Firstly I'd recommend having a look at this excellent presentation by Casey Muratori : http://www.mollyrocket.com/873
(you have to follow the slides and the audio separately unfortunately)

If you plan on keeping a monolithic class, my personal preference would be:

public static Byte[] CompressGzip(Byte[] input);
public static Byte[] CompressGzip(Byte[] input, Level level);

public static Byte[] DecompressGzip(Byte[] input);

public static String CompressGzip(String input);
public static String CompressGzip(String input, Level level);

etc

ie. I know they're bytes, the compiler knows they're bytes, why do I have to type it in? However, keeping Gzip front and centre is important, as its a requirement that data compressed with Gzip is decompressed with the same. Of course, this doesn't work if you can encode Byte arrays to Strings or any combination thereof.

Ie. otherwise this code looks suspiciously non-suspicious:

Format f = Format.NotDefault;

// Use our non-standard compression
String compressed = Compress("my name", f);

// more code, or transfer across the network

// Uh oh! Decompression failed.
// The default parameters are broken in this case!
String decompressed = Decompress(compressed);

By putting the method in the name, you ensure that everybody thinks about what format the compressed bytes are in.

Further, you leave room to add extra compression options for different engines - eg LZMA's dictionary size parameter.

kibibu 2009-08-25 03:32:46

Answer 7

+2 A:

One refactoring improvement I might make is:

public sealed class CompressOptions
{
  public Format Format { get; set; }
  public Level Level { get; set; }
}

You can then reduce to 2 methods per compression target. Using the Byte[] compressors as an example.

public static Byte[] Compress(Byte[] input)
{
    Compress(input, new CompressOptions { Format=Zip, Level=Normal });
}
public static Byte[] Compress(Byte[] input, CompressOptions options)
{
    if( options == null )
        throw new ArgumentNullException("options");

    // compress-away
}

Caller code can then use whatever options they want without you having to provide overrides for every conceivable scenario, which just seem to be duplicated once per scenario (Byte, strings, files).

Byte[] b = GetSomeData();
var result = Compress(b, new CompressOptions { Format=Gzip } );
var result2 = Compress(b, new CompressOptions { Level=Store } );
var result3 = Compress(b);

You might want to make CompressOptions immutable as well (i.e. once set, a value can't be changed)

This design also allows the Compress Options to be passed into code that needs to compress something, without it needing to know what compression to use.

For other compressors that may require more options, you could subclasses from CompressOptions (unseal it first though, and seal any leaf classes). There are a number of variations here.

Robert Paulson 2009-08-25 03:50:22

I'd also recommend this in conjunction with @jrista's answer

Robert Paulson 2009-08-25 03:57:25

I do like your answer, esp since you have an option for parts of the code to not need the details, so it could be injected in using DI, so the user can set these parameters.

James Black 2009-08-26 00:59:17

Answer 8

A:

In our CORE library we offer this class as a 20,000 line abstraction....

20k lines? Seriously? It's not the API surface that's your problem then. I mean surely, CompressString(string) is just calling into CompressString(string, Format, Level) - right? And surely CompressString(string, Format, Level) basically consists of:

byte[] b = System.Text.Encoding.Default.GetBytes(input);
byte[] c = CompressBytes(b, format, level);
return Convert.ToBase64(c);

which is all of 3 lines - with the temp variables. I can think of similar implementations of the rest.

So - that leads me to believe that CompressBytes(byte[], Format, Level) must be around 19,500 lines. I'd say that's your problem.

Mark Brackett 2009-08-25 23:18:17

How does this answer my question? Also I updated my question to mention we use a SharpZipLib backing which is in fact about 20k lines.

ChaosPandion 2009-08-26 00:53:49

ansaurus

tags:

views:

answers:

Can this API be improved?

related questions