[Java] How to measure robustness?

views:

357

answers:

+7 Q:

[Java] How to measure robustness?

I am working on a thesis about meassuring quality of a product. The product in this case is a website. I have identified several quality attributes and meassurement techniques.

One quality attribute is "Robustness". I want to meassure that somehow, but I can't find any useful information how to do this in an objective manner.

Is there any static or dynamic metric that could meassure robustness? Ie, like unit test coverage, is there a way to meassure robustness like that? If so, is there any (free) tool that can do such a thing?

Does anyone have any experience with such tooling?

Last but not least, perhaps there are other ways to determine robustness, if you have any ideas about that I am all ears.

Thanks a lot in advance.

+1 A:

Robustness is very subjective but you could have a look at FingBugs, Cobertura and Hudson which when correctly combined together could give you a sense of security over time that the software is robust.

cherouvim 2010-03-01 07:34:17

+12 A:

Well, the short answer is "no." Robust can mean a lot of things, but the best definition I can come up with is "performing correctly in every situation." If you send a bad HTTP header to a robust web server, it shouldn't crash. It should return exactly the right kind of error, and it should log the event somewhere, perhaps in a configurable way. If a robust web server runs for a very long time, its memory footprint should stay the same.

A lot of what makes a system robust is its handling of edge cases. Good unit tests are a part of that, but it's quite likely that there will not be unit tests for any of the problems that a system has (if those problems were known, the developers probably would have fixed them and only then added a test).

Unfortunately, it's nearly impossible to measure the robustness of an arbitrary program because in order to do that you need to know what that program is supposed to do. If you had a specification, you could write a huge number of tests and then run them against any client as a test. For example, look at the Acid2 browser test. It carefully measures how well any given web browser complies with a standard in an easy, repeatable fashion. That's about as close as you can get, and people have pointed out many flaws with such an approach (for instance, is a program that crashes more often but does one extra thing according to spec more robust?)

There are, though, various checks that you could use as a rough, numerical estimate of the health of a system. Unit test coverage is a pretty standard one, as are its siblings, branch coverage, function coverage, statement coverage, etc. Another good choice is "lint" programs like FindBugs. These can indicate the potential for problems. Open source projects are often judged by how frequently and recently commits are made or releases released. If a project has a bug system, you can measure how many bugs have been fixed and the percentage. If there's a specific instance of the program you're measuring, especially one with a lot of activity, MTBF (Mean Time Between Failures) is a good measure of robustness (See Philip's Answer)

These measurements, though, don't really tell you how robust a program is. They're merely ways to guess at it. If it were easy to figure out if a program was robust, we'd probably just make the compiler check for it.

Good luck with your thesis! I hope you come up with some cool new measurements!

CaptainAwesomePants 2010-03-01 07:44:10

+1 great answer!

Philip Potter 2010-03-01 08:05:17

Thanks a lot, great answer helps me a lot.Robustness is one of the nine quality attributes I find important in my thesis. I was afraid the answer would be something like he above. But I can use this information for various approaches.In terms of robustness i simply define it as:"A (piece of) program that will not crash/behave unexpected when recieving unexpected data as input". In other words, it should not be Garbage In Garbage Out. But rather be Garbage In, useful output out.Again, thanks a lot.

Stefan Hendriks 2010-03-01 09:22:29

When harvesting from others answers, please make a direct link to their answer when attributing.

Thorbjørn Ravn Andersen 2010-03-02 08:06:44

+4 A:

You could look into mean time between failures as a robustness measure. The problem is that it is a theoretical quantity which is difficult to measure, particularly before you have deployed your product to a real-world situation with real-world loads. Part of the reason for this is that testing often does not cover real-world scalability issues.

Philip Potter 2010-03-01 08:09:08

That's a great stat to use for production systems! I'm stealing it and adding it to my answer.

CaptainAwesomePants 2010-03-01 09:15:55

And stealing the reputation I should be getting? :P

Philip Potter 2010-03-01 11:11:08

don't worry you recieved points from me ;)

Stefan Hendriks 2010-03-01 11:46:39

I find that the mean time to recovery must be included to paint up the whole picture.

Christian 2010-03-02 06:51:08

+2 A:

In our Fuzzing book (by Takanen, DeMott, Miller) we have several chapters dedicated for metrics and coverage in negative testing (robustness, reliability, grammar testing, fuzzing, many names for the same thing). Also I tried to summarize most important aspects in our company whitepaper here:

http://www.codenomicon.com/products/coverage.shtml

Snippet from there:

Coverage can be seen as the sum of two features, precision and accuracy. Precision is concerned with protocol coverage. The precision of testing is determined by how well the tests cover the different protocol messages, message structures, tags and data definitions. Accuracy, on the other hand, measures how accurately the tests can find bugs within different protocol areas. Therefore, accuracy can be regarded as a form of anomaly coverage. However, precision and accuracy are fairly abstract terms, thus, we will need to look at more specific metrics for evaluating coverage.

The first coverage analysis aspect is related to the attack surface. Test requirement analysis always starts off by identifying the interfaces that need testing. The number of different interfaces and the protocols they implement in various layers set the requirements for the fuzzers. Each protocol, file format, or API might require its own type of fuzzer, depending on the security requirements.

Second coverage metric is related to the specification that a fuzzer supports. This type of metric is easy to use with model-based fuzzers, as the basis of the tool is formed by the specifications used to create the fuzzer, and therefore they are easy to list. A model-based fuzzer should cover the entire specification. Whereas, mutation-based fuzzers do not necessarily fully cover the specification, as implementing or including one message exchange sample from a specification does not guarantee that the entire specification is covered. Typically when a mutation-based fuzzer claims specification support, it means it is interoperable with test targets implementing the specification.

Especially regarding protocol fuzzing, the third-most critical metric is the level of statefulness of the selected Fuzzing approach. An entirely random fuzzer will typically only test the first messages in complex stateful protocols. The more state-aware the fuzzing approach you are using is, the deeper the fuzzer can go in complex protocols exchanges. The statefulness is a difficult requirement to define for Fuzzing tools, as it is more a metric for defining the quality of the used protocol model, and can, thus, only be verified by running the tests.

I hope this was helpful. We also have studies in other metrics such as looking at code coverage and other more or less useless data. ;) Metrics is a great topic for a thesis. Email me at [email protected] if you are interested to get access to our extensive research on this topic.

Ari Takanen 2010-03-02 06:16:29

Really helpful information indeed. I am very interested in any research regarding that area. I do wonder on what level of robustness we're speaking. Is it a code level, module level or service level?

Stefan Hendriks 2010-03-02 08:11:04

You could look into mean time between failures as a robustness measure.

The problem with "MTBF" is that it is usually measured in positive traffic whereas failures often happen in unexpected situations. It does not give any indication of robustness or reliability. No matter if a web site stays always on in lab environment, it will still be hacked in a second in the Internet if it has a weakness.

Ari Takanen 2010-03-02 06:20:30

Don't put comments to answers in new answers. Use "Add comment" instead. This is robust against answer reordering (which happens, based on up's and posters rep).

Thorbjørn Ravn Andersen 2010-03-02 08:05:18

@Thorbjorn: You can't leave comments until you get 50 reputation. I think this is a silly requirement, but that's the way it is.

Philip Potter 2010-03-02 08:37:15

ansaurus

tags:

views:

answers:

[Java] How to measure robustness?

related questions