tags:

views:

198

answers:

7

I've got code that looks like this because the only reliable way for me to check if some data is an image is to actually try and load it like an image.

    static void DownloadCompleted(HttpConnection conn) {
        Image img;
        HtmlDocument doc;

        try {
            img = Image.FromStream(conn.Stream);
        } catch {
            try {
                doc = new HtmlDocument();
                doc.Load(conn.Stream);
            } catch { return; }
            ProcessDocument(doc);
            return;
        }
        ProcessImage(img);
        return;
    }

Which looks down right terrible!

What's a nice way of handling these situations? Where you're basically forced to use an exception like an if statement?

+1  A: 

I think empty catch blocks, the way you've coded them, are a very bad idea in general. You're swallowing an exception there. No one will ever know that something is amiss the way you've coded it.

My policy is that if I can't handle and recover from an exception, I should simply allow it to bubble up the call stack to a place where it can be handled. That might mean nothing more than logging the error, but it's better than hiding the error.

If unchecked exceptions are the rule in .NET, I'd recommend that you use just one try/catch block when you must within a single method. I agree - multiple try/catch blocks in a method is ugly and cluttering.

I'm not sure what your code is trying to do. It looks like you're using exceptions as logic: "If an exception isn't thrown, process and return an image; if an exception IS thrown, process and return an HTML document." I think this is a bad design. I'd refactor it into two methods, one each for an image and HTML, and not use exceptions instead of an "if" test.

duffymo
But this is my point!! It's not an error at all! I *expect* half the documents I download to be images, the others to be HTML documents. I'm simply trying to determine which it is. Why record or throw something that *isn't* an error at all?
Mark
I see your point. Is there no other way to decide when you have an image or HTML document without throwing the exception?
duffymo
With regards to your edit: Yes, I am using exceptions as control flow, but I don't have much of a choice here. There is no `IsImage(Stream)` function. What am I supposed to do?
Mark
I can look at the `ContentType` header, but it's not reliable. The only way I know of to reliability determine if data is an image, is to actually try to parse it like an image.
Mark
Why is the header not reliable? I don't know - please explain.
duffymo
@duffymo: Because the server can send whatever they please. Just because it says `image/jpeg` does not mean that it necessarily is, or if it doesn't, it necessarily isn't. See http://stackoverflow.com/questions/1938317/reliability-of-content-type-image
Mark
Thank you, Mark.
duffymo
+6  A: 

Your logical structure is

if( /* Loading Image Fails */ )
    /* Try Loading HTML */

so I would try to make the code read that way. It would probably be cleanest (though admittedly annoyingly verbose) to introduce helper methods:

bool LoadImage()
{
    Image img;
    try 
    {
        img = Image.FromStream(conn.Stream);
    } 
    catch( NotAnImageException /* or whatever it is */ )
    {
        return false;
    }

    ProcessImage(img);
    return true;
}


bool LoadDocument()
{
   // etc
}

So you can write

if( !LoadImage() )
   LoadDocument();

or extend it to:

if( !LoadImage() && !LoadDocument() )
{
     /* Complain */
}
Eric
+1 - I think this looks much cleaner to me. Just one question: is the "centerfielder" catch block, without a particular type, the way to go here? Wouldn't it be better to call specifically for the exception that's likely to be thrown?
duffymo
@duffymo -- we crossed in the mail! I have a bad habit of throwing up a quick answer and then editing it 3 times in the next 5 minutes. ;)
Eric
The problem with this is that now you have `ProcessImage` inside your try block, and if processing the image fails, that actually *is* an error, and should be handled appropriately.
Mark
@Mark -- I moved it out of the try block, to match the current logic from your question.
Eric
Yeah... I guess this the way to go then. Although I don't know how I feel about tying the `ProcessImage` into the load function at all. By keeping it outside, the function is more versatile. Then it becomes an `IsImage` function, which could be used for other purposes too...
Mark
+1  A: 

In conjunction to duffymo's answer, if you're dealing with

  • File input/output, use IOException
  • Network sockets, use SocketException
  • XML, use XmlException

That would make catching exceptions tidier as you're not allowing it to bubble up to the generic catch-all exception. It can be slow, if you use the generic catch-all exception as the stack trace will be bubbling up and eating up memory as the references to the Exception's property, InnerException gets nested as a result.

Either way, craft your code to show the exception and log it if necessary or rethrow it to be caught by the catch-all exception...never assume the code will not fail because there's plenty of memory, plenty of disk and so on as that's shooting yourself in the foot.

If you want to design your own Exception class if you are building a unique set of classes and API, inherit from the ApplicationException class.

Edit: I understand better what the OP is doing...I suggest the following, cleaner implementation, notice how I check the stream to see if there's a HTML tag or a DOC tag as part of XHTML...

static void DownloadCompleted(HttpConnection conn) {
        Image img;
        HtmlDocument doc;
        bool bText = false;
        using (System.IO.BinaryReader br = new BinaryReader(conn.Stream)){
            char[] chPeek = br.ReadChars(30);
            string s = chPeek.ToString().Replace(" ", "");
            if (s.IndexOf("<DOC") > 0 || s.IndexOf("<HTML") > 0){
                // We have a pure Text!
                bText = true;
            }
        }
        if (bText){
            doc = new HtmlDocument();
            doc.Load(conn.Stream);
        }else{
            img = Image.FromStream(conn.Stream);
        }
}

Increase the length if you so wish depending on how far into the conn.Stream indicating where the html tags are...

Hope this helps, Best regards, Tom.

tommieb75
`Image.FromStream` only throws an `ArgumentException` anyway... and, I think your answer is a little off-base as per the comments below duffymo's.
Mark
@Mark: Ok, ArgumentException occurred because you were attempting to load a HTML using Image.FromStream...It might be better to peek at the first 30 bytes say, and look for a '<DOC' or a '<HTML'...instead of blindly using Image.FromStream..what you think?
tommieb75
Well, I'll give you a +1 for effort, but this doesn't really address the problem either. Firstly, if it's not HTML, it still isn't necessarily an image (it could be a PDF or something). Secondly, I'm not sure how robust this solution is at even checking for HTML. I really don't care how malformed the HTML is, as long as `HtmlAgilityPack` can pick out the `<a>` tags. I'm thinking `FromStream` should fail pretty early if the first few bytes don't make sense anyway. Tests show it takes about 80ms to fail and throw an exception. Not fast, but not too slow in the grand scheme of things.
Mark
A: 

My recommendations:

Check to see if the connection is null as a first step.

if (conn == null){
return;
}

There is no point in processing if it is null. There seems to be some school of thought out there that says try statements are magic. You don't have to put anything inside a try statement. So therefore to address one of the questions you had, if you don't care don't even bother wrapping it in the statement.

Also look into using if statements, as it appears that is what you were going for above.

Woot4Moo
Any particular reason for the downvote?
Woot4Moo
I wasn't the downvoter, but my guess is that it was downvoted because it doesn't address the question. The question implies that it isn't actual code that was posted, just an example of the flow that the poster doesn't like.
aehiilrs
"What if I just want to try something, and I don't really care if it fails?" Just one of those crazy things where a question mark was used I suppose. =p
Woot4Moo
The comment after clarifies that. He wants to try reading it as an image, and doesn't care if that fails because he's going to try to read it as a document immediately after.
aehiilrs
I see said the blind man to the deaf dog
Woot4Moo
Uh... I didn't downvote this either, but the other guys are right. It's not failing because the connection is null (it definitely isn't), it's failing because the content isn't an image. Which is exactly what I'm trying to check, but there is no method `IsImage()` so the only way I can check is by trying to load it as an image, and catching the exception.
Mark
I didn't say it was failing because the connection is null. It was for the purpose of defensively coding against the null.
Woot4Moo
... which didn't address the question at all. We've come full circle.
aehiilrs
actually @aehiilrs before he edited it addressed A question he had. I am only a native english speaker, but when there is a question mark( this symbol '?') that denotes a question. Anyways, there was a lot left to interpretation before the op edited it.
Woot4Moo
Ehhh, I guess I see how you misread it, but to me the question was pretty clearly about organizing his try/catch blocks and not just trying to fail fast.
aehiilrs
+1  A: 

I like the idea of Eric's example, but I find the implementation ugly: doing work in an if and having a method that does work return a boolean looks very ugly.

My choice would be a method that returns Images, and a similar method

Image LoadImage(HttpConnection conn)
{
    try 
    {
        return Image.FromStream(conn.Stream);
    } 
    catch(NotAnImageException)
    {
        return null;
    }
}

and do the work in the original method:

static void DownloadCompleted(HttpConnection conn)
{
    Image img = LoadImage(conn);

    if(img != null)
    {
         ProcessImage(img);
         return;
    }

    HtmlDocument doc = LoadDocument(conn);

    if(doc != null)
    {
         ProcessDocument(doc)
         return;
    }

    return;
}
Tordek
Yeah... I agree with you on this one, that's exactly what I did after I read Eric's post. But then I decided adding 2 extra methods and then checking against null isn't a lot better than my original code either. So I'll think I'll just leave it in this case, but if it gets any more nested than that, this is probably the way to go.
Mark
+1  A: 

There is no need to try to guess the content type of an HTTP download, which is why you are catching exceptions here. The HTTP protocol defines a Content-Type header field which gives you the MIME type of the downloaded content.

Also, a non-seekable stream can only be read once. To read the content of a stream multiple times, you would have to copy it to a MemoryStream and reset it before each read session with Seek(0, SeekOrigin.Begin).

Wim Coenen
As I wrote in another comment, the `Content-Type` isn't necessarily reliable http://stackoverflow.com/questions/1938317/reliability-of-content-type-image. The server can set that to be whatever, and it won't necessarily match the output. Although some statistics on that would be nice. `conn.Stream` actually is a `MemoryStream`. Perhaps I should change that property name, but its more evident in the IDE. Lastly, if the stream does need to be reset, then `Image.FromStream` is doing it automatically, because the code in my question works fine.
Mark
+1 for your linked question, but I would simply assume that the mime type is typically correct until I had evidence showing otherwise.
Wim Coenen
A: 

Answering your exact question: you don't need a catch block at all, try/finally block will work just fine without a catch block.

try
{
    int i = 0;
}
finally {}
HeavyWave
Well...sorry, maybe I should amend my question to say without a catch OR finally block. Having an empty finally block really doesn't look any better ;)
Mark