views:

994

answers:

15

Given software where ...

  • The system consists of a few subsystems
  • Each subsystem consists of a few components
  • Each component is implemented using many classes

... I like to write automated tests of each subsystem or component.

I don't write a test for each internal class of a component (except inasmuch as each class contributes to the component's public functionality and is therefore testable/tested from outside via the component's public API).

When I refactor the implementation of a component (which I often do, as part of adding new functionality), I therefore don't need to alter any existing automated tests: because the tests only depend on the component's public API, and the public APIs are typically being expanded rather than altered.

I think this policy contrasts with a document like Refactoring Test Code, which says things like ...

  • "... unit testing ..."
  • "... a test class for every class in the system ..."
  • "... test code / production code ratio ... is ideally considered to approach a ratio of 1:1 ..."

... all of which I suppose I disagree with (or at least don't practice).

My question is, if you disagree with my policy, would you explain why? In what scenarios is this degree of testing insufficient?

In summary:

  • Public interfaces are tested (and retested), and rarely change (they're added to but rarely altered)
  • Internal APIs are hidden behind the public APIs, and can be changed without rewriting the test cases which test the public APIs


Footnote: some of my 'test cases' are actually implemented as data. For example, test cases for the UI consist of data files which contain various user inputs and the corresponding expected system outputs. Testing the system means having test code which reads each data file, replays the input into the system, and asserts that it gets the corresponding expected output.

Although I rarely need to change test code (because public APIs are usually added to rather than changed), I do find that I sometimes (e.g. twice a week) need to change some existing data files. This can happens when I change the system output for the better (i.e. new functionality improves existing output), which might cause an existing test to 'fail' (because the test code only tries to assert that output hasn't changed). To handle these cases I do the following:

  • Rerun the automated test suite which a special run-time flag, which tells it to not assert the output, but instead to capture the new output into a new directory
  • Use a visual diff tool to see which output data files (i.e. what test cases) have changed, and to verify that these changes are good and as expected given the new functionality
  • Update the existing tests by copying new output files from the new directory into the directory from which test cases are run (over-writing the old tests)


Footnote: by "component", I mean something like "one DLL" or "one assembly" ... something that's big enough to be visible on an architecture or a deployment diagram of the system, often implemented using dozens or 100 classes, and with a public API that consists of only about 1 or a handful of interfaces ... something that may be assigned to one team of developers (where a different component is assigned to a different team), and which will therefore according to Conway's Law having a relatively stable public API.


Footnote: The article Object-Oriented Testing: Myth and Reality says,

Myth: Black box testing is sufficient. If you do a careful job of test case design using the class interface or specification, you can be assured that the class has been fully exercised. White-box testing (looking at a method's implementation to design tests) violates the very concept of encapsulation.

Reality: OO structure matters, part II. Many studies have shown that black-box test suites thought to be excruciatingly thorough by developers only exercise from one-third to a half of the statements (let alone paths or states) in the implementation under test. There are three reasons for this. First, the inputs or states selected typically exercise normal paths, but don't force all possible paths/states. Second, black-box testing alone cannot reveal surprises. Suppose we've tested all of the specified behaviors of the system under test. To be confident there are no unspecified behaviors we need to know if any parts of the system have not been exercised by the black-box test suite. The only way this information can be obtained is by code instrumentation. Third, it is often difficult to exercise exception and error-handling without examination of the source code.

I should add that I'm doing whitebox functional testing: I see the code (in the implementation) and I write functional tests (which drive the public API) to exercise the various code branches (details of the feature's implementation).

A: 

I personally test protected parts too, because they are "public" to inherited types...

Ali Shafai
I'd find that applicable if my protected parts were being used by customers (e.g. if I were shipping a library with protected parts to other programmers).
ChrisW
+1  A: 

If you are practicing pure test-driven development then you only implement any code after you have any failing test, and only implement test code when you have no failing tests. Additionally only implement the simplest thing to make a failing or passing test.

In the limited TDD practice I've had I've seen how this helps me flush out unit tests for every logical condition produced by the code. I'm not entirely confident that 100% of the logical features of my private code is exposed by my public interfaces. Practicing TDD seems complimentary to that metric, but there are still hidden features not allowed by the public APIs.

I suppose you could say this practice protects me against future defects in my public interfaces. Either you find that useful (and lets you add new features more rapidly) or you find that it is a waste of time.

Karl the Pagan
I understood the first two paragraphs, but not the first sentence of the third paragraph.
ChrisW
By having tests for all of my internal code, I am protected when I choose to use more of that internal code which isn't exposed to the public at first. That's what I mean by "future defects". As I extend my program I'm more likely to cover internal cases which were not exposed at first.
Karl the Pagan
A: 

I agree that code coverage should ideally be 100%. This does not necessarily mean 60 lines of code would have 60 lines of test code, but that each execution path is tested. The only thing more annoying than a bug is a bug that hasn't run yet.

By only testing the public API you run this risk of not testing all instances of the internal classes. I am really stating the obvious by saying that, but I think it should be mentioned. The more each behavior is tested, the easier it is to recognize not only that it is broken, but what is broken.

You said, "This does not necessarily mean 60 lines of code would have 60 lines of test code". The *unit* test people seem to say that every class should have corresponding tests ... whereas I have tests for collections of classes (i.e. for components/packages/assemblies/libraries) ... the only classes for whch I have tests are the public classes which define the external API.
ChrisW
I find that in order to add one new piece of functionality, I need to add one new test case (to test the new functionality) and maybe edit a dozen existing classes (to implement the new functionality). N.B. that editing a dozen existing classes does *not* mean editing or creating a dozen test cases (one test case per class).
ChrisW
No you would only edit those test cases which turns out broken. Don't edit the test which ain't broken. And creating a dozen classes, no way in our case they would be in place already.
Adeel Ansari
+3  A: 

My practice is to test the internals through the public API/UI. If some internal code cannot be reached from the outside, then I refactor for removing it.

mouviciel
Do you use a code coverage tool, to discover internal code which cannot be or which isn't being reached from the outside? I wonder how such code came into existence.
ChrisW
It happens some times, take the case of exceptions handling blocks. Most of them sometimes go without test, because of the very reason.
Adeel Ansari
@ChrisW: Depending on how much effort I want to spend on it, I use debug traces or gcov (which is integrated to Xcode). About how that code came into existence, it is true that using TDD helps me not to write it. But sometimes features are removed or modified.@Vinegar: Usually I try to test exception handling blocks, at least with a manual test case that I run only once. If I can't imagine a situation for reaching that code, I tend to remove it.
mouviciel
+7  A: 

I don't have my copy of Lakos in front of me, so rather than cite I will merely point out that he does a better job than I will of explaining why testing is important at all levels.

The problem with testing only "public behavior" is such a test gives you very little information. It will catch many bugs (just as the compiler will catch many bugs), but cannot tell you where the bugs are. It is common for a badly implemented unit to return good values for a long time and then stop doing so when conditions change; if that unit had been tested directly, the fact that it was badly implemented would have been evident sooner.

The best level of test granularity is the unit level. Provide tests for each unit through its interface(s). This allows you to validate and document your beliefs about how each component behaves, which in turn allows you to test dependent code by only testing the new functionality it introduces, which in turn keeps tests short and on target. As a bonus, it keeps tests with the code they're testing.

To phrase it differently, it is correct to test only public behavior, so long as you notice that every publicly visible class has public behavior.

darch
You're quite right: I've added my definition of 'component' as a footnote to the OP. Lakos' definition of 'component' is 'one source file', which is much smaller than what I'm using. What I mean by 'component' is possibly what Lakos calls a 'package'.
ChrisW
You said that "testing only public behavior ... will catch many bugs (just as the compiler will catch many bugs), but cannot tell you where the bugs are." Two comments: 1) Any bug is usually connected to whatever I'm editing at the moment and haven't checked in yet (which narrows it down a lot, given that I check-in frequently). 2) Unit tests (of each class) wouldn't necessarily help, because a bug is are often not in one class but is rather in the interaction between classes.
ChrisW
For that we got interaction based tests. Don't you know that :). Check this out. http://www.woodwardweb.com/programming/state_based_tes.html
Adeel Ansari
To ChrisW: regarding your point #2 -> this is exactly why you want unit tests. If the tests for class A and B work well, but the functional test using A and B fails, you know it's an interaction problem. Otherwise, you have to investigate all three possibilities (A has a bug, B has a bug, A+B don't play nice with each other)
Kena
A: 

I test private implementation details as well as public interfaces. If I change an implementation detail and the new version has a bug, this allows me to have a better idea of where the error actually is and not just what it is effecting.

Nathaniel Flath
+13  A: 

The answer is very simple: you are describing functional testing, which is an important part of software QA. Testing internal implementation is unit-testing, which is another part of software QA with a different goal. That's why you are feeling that people disagree with your approach.

Functional testing is important to validate that the system or subsystem does what it is supposed to do. Anything the customer sees should be tested this way.

Unit-test is here to check that the 10 lines of code you just wrote does what it is supposed to do. It gives you higher confidence on your code.

Both are complementary. If you work on an existing system, functional testing is the first thing to work on probably. But as soon as you add code, unit-testing it is a good idea also.

Bluebird75
When I implement a new feature, I exercise it (i.e. implementation of the new feature) with a functional test. Why/when might it be a "good idea to also unit test"? Isn't a functional test sufficient? Isn't a unit test a bit of a waste of time (e.g. because it needs to be reworked if the implementation is refactored)? It's pretty rare that I'll write a unit test: one time was when I needed to exercise a class which wrapped the system date (where it wasn't convenient to do real functional testing by waiting for the real system date to change). Also, if I'm the one developing two components, ...
ChrisW
... then I'll tend to test the two together (i.e. "integration testing"): instead of creating a "mock" of either of them that would let me test the other by iteself.
ChrisW
Unit-test allow to discover the source of a bug more precisely. And no, it's not a waste of time, because there are many things that can not be tested properly by functional testing, which are still worth testing. Typically, "difficult to simulate" error are very useful to unit-test. I am talking about all those functions that return NULL instead of a valid pointer, network connectivity lost, cases of config file unreadable, ...And yes, you have to refactor them along with your code.
Bluebird75
+3  A: 

There have been a lot of great responses to this question so far, but I want to add a few notes of my own. As a preface: I am a consultant for a large company that delivers technology solutions to a wide range of large clients. I say this because, in my experience, we are required to test much more thoroughly than most software shops do (save maybe API developers). Here are some of the steps we go through to ensure quality:

  • Internal Unit Test:
    Developers are expected to create unit tests for all the code they write (read: every method). The unit tests should cover positive test conditions (does my method work?) and negative test conditions (does the method throw an ArgumentNullException when one of my required arguments is null?). We typically incorporate these tests into the build process using a tool like CruiseControl.net
  • System Test / Assembly Test:
    Sometimes this step is called something different, but this is when we begin testing public functionality. Once you know all your individual units function as expected, you want to know that your external functions also work the way you think they should. This is a form of functional verification since the goal is to determine whether the entire system works the way it should. Note that this does not include any integration points. For system test, you should be using mocked-up interfaces instead of the real ones so that you can control the output and build test cases around it.
  • System Integration Test:
    At this stage in the process, you want to connect your integration points to the system. For example, if you're using a credit card processing system, you'll want to incorporate the live system at this stage to verify that it still works. You would want to perform similar testing to system/assembly test.
  • Functional Verification Test:
    Functional verification is users running through the system or using the API to verify that it works as expected. If you've built an invoicing system, this is the stage at which you will execute your test scripts from end to end to ensure that everything works as you designed it. This is obviously a critical stage in the process since it tells you whether you've done your job.
  • Certification Test:
    Here, you put real users in front of the system and let 'em have a go at it. Ideally you've already tested your user interface at some point with your stakeholders, but this stage will tell you whether your target audience likes your product. You might've heard this called something like a "release candidate" by other vendors. If all goes well at this stage, you know you're good to move into production. Certification tests should always be performed in the same environment you'll be using for production (or an identical environment at least).

Of course, I know that not everyone follows this process, but if you look at it from end to end, you can begin to see the benefits of the individual components. I haven't included things like build verification tests since they happen on a different timeline (e.g., daily). I personally believe that unit tests are critical, because they give you deep insight into which specific component of your application is failing at which specific use case. Unit tests will also help you isolate which methods are functioning correctly so that you don't spend time looking at them for more information about a failure when there's nothing wrong with them.

Of course, unit tests could also be wrong, but if you develop your test cases from your functional/technical specification (you have one, right? ;)), you shouldn't have too much trouble.

Ed Altorfer
I think I'd name these steps "unit test" (a unit), "component test" (each larger component), "integration test" (several components), "system test" (whole system), and "acceptance test" (by the customer and/or end users).
ChrisW
ChrisW, feel free to name them as you see fit, of course; the names I provided are the names we use at our company. I've seen assembly/system test interchanged, but yeah. At the end of the day, it's the concept and execution that matters for us.
Ed Altorfer
Perhaps unit testing does not necessarily improve the overall final quality of software: rather the main reason for unit testing to provide *earlier* testing (i.e. pre-component-test and pre-integration-test). Software which hasn't been unit tested can be as good as software which was unit tested: because the coverage from functional tests can be as good as (if not even better than) the coverage from unit tests. The thing which unit testing does effect is not so much the quality of the end product, but more the cost and efficiency of the development process.
ChrisW
Software which is unit tested may be less expensive than software without unit tests (because debugging during integration testing can be less efficient and more expensive than debugging during unit testing); or it may be more expensive (because writing and maintaining unit tests as well as functional tests is an extra cost in its own right).
ChrisW
I agree to some extent, ChrisW, but I would posit that software which is developed at a reduced cost and higher effiency is inherently of a higher quality. Also, one could argue that, if it takes you a shorter time to build something with unit tests, you have more resources to allocate to more features, which benefits your audience and your company.Just my $0.02. I think you have the right idea. :)
Ed Altorfer
A: 

[An answer to my own question]

Maybe one of the variables that matters a lot is how many different programmers there are coding:

  • Axiom: each programmer should test their own code

  • Therefore: if a programmer writes and delivers one "unit", then they should also have tested that unit, quite possibly by writing a "unit test"

  • Corollary: if a single programmer writes a whole package, then it's sufficient for the programmer to write functional tests of the whole package (no need to write "unit" tests of units within the package, since those units are implementation details to which other programmers have no direct access/exposure).

Similarly, the practice of building "mock" components which you can test against:

  • If you have two teams building two components, each may need to "mock" the other's component so that they have something (the mock) against which to test their own component, before their component is deemed ready for subsequent "integration testing", and before the other team has delivered their component against which your component can be tested.

  • If you're developing the whole system then you can grow the entire system ... for example, develop a new GUI field, a new database field, a new business transaction, and one new system/functional test, all as part of one iteration, with no need to develop "mocks" of any layer (since you can test against the real thing instead).

ChrisW
If you have a choice, you should use "adversary testing". You don't want the guy who wrote the code to test it; he can't see holes because he believes it works. You want an unbiased or even antagonistic tester to consider possible holes and write tests to verify those cases don't occur.
Ira Baxter
Ira: I agree that "adversary testing" can be valuable, but only as a post-process. Relying on "adversary testing" is horrendously wasteful at the unit/integration test level. The worst part is that if software is written with no regards to testability, it is extremely hard going to write test code for it! The software engineer is absolved of responsibility for cleaning up their own untestable code and makes the tester's job a nightmare. I find it to be much more productive when the developer writes the bulk of the tests and an "adversary testing" pass is covered later (or a code review).
Mark Simpson
A: 

Axiom: each programmer should test their own code

I don't think this is universally true.

In cryptography, there's a well-known saying: "it's easy to create a cipher so secure you don't know how to break it yourself."

In your typical development process, you write your code, then compile and run it to check that it does what you think it does. Repeat this a bunch of time and you'll feel pretty confident about your code.

Your confidence will make you a less vigilant tester. One who doesn't share your experience with the code will not have the issue.

Also, a fresh pair of eyes may have fewer preconceptions not just about the code's reliability but also about what the code does. As a consequence, they may come up with test cases the code's author hasn't thought of. One would expect those to either uncover more bugs, or spread knowledge about what the code does around the organization a bit more.

Additionally, there's an argument to be made that to be a good programmer you have to worry about edge cases, but to be a good tester you have worry obsessively ;-) also, testers may be cheaper, so it may be worth having a separate test team for that reason.

I think the overarching question is this: which methodology is the best at finding bugs in software? I've recently watched a video (no link, sorry) stating that randomized testing is cheaper than and as effective as human-generated tests.

Jonas Kölker
I don't mean that test their own code *instead of* someone else testing it: I mean, when they're working in a team of developers then they should test their own code *before* someone else tests it ... in other words, on a team you can't check-in untested code that will break the build and interfere with other developer's work ... and, other components which you need for integration testing may not exist yet ... and, debugging bugs found in integration is more difficult/expensive ... and therefore the more you're working on a team the more important it may be to do early, unit testing.
ChrisW
Conversely, the more coherent your view of the software, and the less you're interfering with and depending on other developers, then the more you can afford to skip early unit testing and instead have only integration testing.
ChrisW
A: 

You shouldn't blindly think that a unit == a class. I think that can be counter productive. When I say that I write a unit test I'm testing a logical unit - "something" that provides some behaviour. A unit may be a single class, or it may be several classes working together to provide that behaviour. Sometimes it starts out as a single class, but evolves to become three or four classes later.

If I start with one class and write tests for that, but later it becomes several classes, I will usually not write separate tests for the other classes - they are implementation details in the unit being tested. This way I allow my design to grow, and my tests are not so fragile.

I used to think exactly like CrisW demonstartes in this question - that testing at higher levels would be better, but after getting some more experience my thoughts are moderated to something between that and "every class should have a test class". Every unit should have tests, but I choose to define my units slightly different from what I once did. It might be the "components" CrisW talks about, but very often it also is just a single class.

In addition, functional tests can be good enough to prove that your system does what it's supposed to do, but if you want to drive your design with examples/tests (TDD/BDD), lower lever tests are a natural consequence. You could throw those low-level tests away when you are done implementing, but that would be a waste - the tests are a positive side effect. If you decide to do drastic refactorings invalidating your low-level tests, then you throw them away and write new once.

Separating the goal of testing/proving your software, and using tests/examples to drive your design/implementation can clarify this discussion a lot.

Update: Also, there are basically two ways of doing TDD: outside-in and inside-out. BDD promotes outside-in, which leads to higher-level tests/specifications. If you start from the details however, you will write detailed tests for all classes.

Torbjørn
When "very often it also is just a single class", what is your motive for such a test? Why not, instead, cover this class by testing/exercising the externally-visible functionality which it helps to implement ('externally-visible functionality' meaning public/visible from outside the package of which any single class is but one implementation detail)?
ChrisW
As I said, I use tests to drive my design/code. If I was only interested in verifying the behaviour of my solutions, the high-level tests would be enough. They don't help me enough when I implement the details though, so most "responsibilities" in the design gets their own tests.
Torbjørn
+1  A: 

You can code functional tests; that's fine. But you should validate using test coverage on the implementation, to demonstrate that the code being tested all has a purpose relative to the functional tests, and that it actually does something relevant.

Ira Baxter
Are you saying that functional tests don't cover the implementation and that therefore there should be additional (non-functional?) tests? Or are you saying that I should verify (perhaps using a code coverage tool like NCover) whether the implementation is covered by the functional tests?
ChrisW
Arguably only code that serves a detectable purpose in your function should be in your application. If you can't define functionality that exercises some part of the code, what is the point of having that code in the system?(The FAA requires what amounts to 100% test coverage on aircraft software for this reason). You should use a code coverage tool!And if you don't get a high enough coverage level (you're not building airplanes, 100% probabaly isn't necessary), you should code more functional tests that will exercise the code that wasn't covered by other tests.
Ira Baxter
You're saying that functional tests can and should provide sufficient coverage of the code, and that I should measure/test how much of the code is covered. Speaking of coverage, it's even more important to have tests which cover the functionality than to have tests which cover the code. For example I could write a 10-line program and a test which covers that 100%, but that would be insufficient if that program doesn't implement all the functionality that's required.
ChrisW
@ChrisW: Yes, you could write such a test. But then, that test presumably would not pass an inspection as being representative of all the functionality you desired. The point of this discussion is whether you should focus on writing black-box ("requirements/functionality" oriented tests) or white-box tests. I'm suggesting with a test coverage tool, which detects white-box untestedness, you can focus on writing functionality tests only.
Ira Baxter
A: 

I agree with most of the posts on here, however I would add this:

There is a primary priority to test public interfaces, then protected, then private.

Usually public and protected interfaces are a summary of a combination of private and protected interfaces.

Personally: You should test everything. Given a strong testing set for smaller functions, you will be given a higher confidence that that hidden methods work. Also I agree with another person's comment about refactoring. Code coverage will help you determine where the extra bits of code are and to refactor those out if necessary.

monksy
A: 

It depends on your design and where the greatest value will be. One type of application may demand a different approach to another. Sometimes you barely catch anything interesting with unit tests whereas functional/integration tests yield surprises. Sometimes the unit tests fail hundreds of times during development, catching many, many bugs in the making.

Sometimes it's trivial. The way some classes hang together makes the return on investment of testing every path less enticing, so you may just draw a line and move on to hammering something more important/complicated/heavily used.

Sometimes it's not enough to just test the public API because some particularly interesting logic is lurking within, and it's overly painful to set the system in motion and exercise those particular paths. That's when testing the guts of it does pay off.

These days, I tend to write numerous, (often extremely) simple classes that do one or two things tops. I then implement the desired behaviour by delegating all of the complicated functionality to those inner classes. I.e. I have slightly more complex interactions, but really simple classes.

If I change my implementation and have to refactor some of those classes, I usually don't care. I keep my tests insulated as best I can, so it's often a simple change to get them working again. However, if I do have to throw some of the inner classes away, I often replace a handful of classes and write some entirely new tests instead. I often hear people complaining about having to keep tests up to date after refactoring and, while it's sometimes inevitable and tiresome, if the level of granularity is fine enough, it's usually not a big deal to throw away some code + tests.

I feel this is one of the major differences between designing for testability and not bothering.

Mark Simpson
What is one of the major differences? And if I'm testing functionality (like, acceptance testing), then I think it's the requirements or the functional specification (rather than the design or implementation of the code) which needs to be testable.
ChrisW
A: 

Hi Chris,

Are you still following this approach? I also believe that this is right approach. You should only test public interfaces. Now public interface can be a service or some component that takes input from some kind of UI or any other source.

But you should be able to evolve the puplic service or component using the Test First approach. ie Define a public interface and test it for basic functionality. it will fail. Implement that basic functionality using background classes API. Write API to only satisfy this firt test case. Then keep on asking what the service can do more and evolve.

Only balancing decision that should be taken is of breaking the one big service or component into few smaller services and component that can be reused. If you strongly believe a component can be reused accross projects. Then automated tests should be written for that component. But again the tests written for the big service or component should duplicate the functionalitly already tested as a component.

Certain people may go into theorotical discussion that this is not unit testing. So thats fine. The basic idea is to have automated tests that test your software. So what if its not at unit level. If it covers integration with database (which you control) then its only better.

Let me know if you have developed any good process that works for you..since your first post..

regards ameet

Shameet
I disagree that "you should only test public interfaces". I say that "you should test public interfaces" and that "testing private/internal interfaces *may* not be necessary". Unit/component testing is useful, if other components don't exist yet, or if system testing is expensive, or if bug-fixing during integration testing is difficult or time-consuming. Also, from my description of my regression-test framework, you'll see that I'm not doing test-first development.
ChrisW