You appear to be using a rather narrow definition of, ”Encapsulation.” Would I be right in presuming that you define encapsulation to be, “Combining data with methods?”
If I’m wrong then please ignore the rest of this post.
Encapsulation is not a loose term; in fact, it’s defined by the International Organisation for Standardization. The ISO’s Reference Model of Open Distributed Processing - defines the following five concepts:
Entity: Any concrete or abstract thing of interest.
Object: A model of an entity. An object is characterised by its behaviour and, dually, by its state.
Behaviour (of an object): A collection of actions with a set of constraints on when they may occur.
Interface: An abstraction of the behaviour of an object that consists of a subset of the interactions of that object together with a set of constraints on when they may occur.
Encapsulation: the property that the information contained in an object is accessible only through interactions at the interfaces supported by the object.
We can further make a self-evident proposal: as some information is accessible via these interfaces, some information must be hidden and inaccessible within the object. The property such information exhibits is called information hiding, which Parnas defined by arguing that modules should be designed to hide both difficult decisions and decisions that are likely to change, see one of the great computing papers:
http://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf
It’s important to note that it is not only the data that is information-hidden: it is some subset of behaviour associated with the object that is difficult or likely to change.
In your post, you seem to be saying that the difference between encapsulation in OO and in functional programming stems from data management, but at least according to the ISO and Parnas, data management is not the key to encapsulation. So I don’t see why encapsulation in functional programming need be any different from that in OO.
You mention, furthermore, in your post that functional programming provides encapsulation, “… by the methods rather than the data structures.” This, I think, is a difference of scale rather than of absolute. If I use the word, “Object,” rather than, “Data structure,” (again, please let me know if I’m misinterpreting), then you seem to find significance in OO’s encapsulation by object and functional programming’s encapsulation by method.
Yet by the ISO definition above, an object is anything I wish to model. Thus classes may be encapsulated within a package, so long as some of those classes contribute to the package’s interface (i.e., the public classes of the package) and some are information-hidden (the private classes in the package).
By the same token, methods are encapsulated within a class – some methods being public and some private. You can even take this a notch lower and say that McCabian sequential sequences of code are encapsulated within methods. Each forms a graph of nodes encapsulated within encapsulated regions; and all these graphs form a graph stack. Thus functional programming may well encapsulated at the function/file level, but this is no different from the method/class graph of OO, and essentially no difference from the class/package graph of OO either.
Also, note that word Parnas uses above: change. Information hiding concerns potential events, such as the changing of difficult design decisions in the future. You ask where OO’s strength’s lie; well, encapsulation is certainly a strength of OO, but the question then becomes, “Where does encapsulation’s strength lie?” and the answer is one of resounding clarity: change management. Particularly, encapsulation reduces the maximum potential burden of change.
The concept of, “Potential coupling,” is useful here.
“Coupling,” itself is defined as, “A measure of the strength of association established by a connection from one module to another,” in another of computing’s great papers:
http://www.research.ibm.com/journal/sj/382/stevens.pdf
And as the paper says, in words never since bettered, “Minimizing connections between modules also minimizes the paths along which changes and errors propagate into other parts of the system, thus eliminating disastrous, “Ripple,” effects, where changes in one part cause errors in another, necessitating additional changes elsewhere, giving rise to new errors, etc.”
As defined here, however, there are two limitations which can easily be lifted. Firstly, coupling does not measure intra-module connections, and these intra-module connections can give rise to just as many, “Ripple,” effects as inter-module connections (the paper does define, “Cohesion,” to relate intra-module elements, but this is not defined in terms of connections between elements (i.e., references to labels or addresses) with which coupling was defined). Secondly, the coupling of any computer program is a given, in that modules are connected or; there is little scope within the definition of coupling to manage the potential changes of which Parnas speaks.
Both these issues are resolved, to some degree, with the concept of potential coupling: the maximum possible number of connections formable among all elements of a program. In Java, for example, a class that is package-private (the default accessor) within a package cannot have connections formed on it (i.e., no outside classes can depend on it, reflection notwithstanding), but a public class within a package can have dependencies on it. This public class would contribute to the potential coupling even if no other classes depend on it at the moment – classes might depend on it in future, when the design changes.
To see encapsulation’s strength, consider the Principle of Burden. The Principle of Burden takes two forms.
The strong form states that the burden of transforming a collection of entities is a function of the number of entities transformed. The weak form states that the maximum potential burden of transforming a collection of entities is a function of the maximum potential number of entities transformed.
The burden of creating or modifying any software system is a function of the number of classes created or modified (here we use, “Classes,” presuming an OO system, and are concerned with encapsulation at the class/package level; we could equally have concerned ourselves with the function/file level of functional programming). (Note that the, “Burden,” is modern software development is usually cost, or time, or both.)
Classes that depend on a particular, modified class have a higher probability of being impacted than classes that do not depend on the modified class.
The maximum potential burden a modified class can impose is the impacting of all classes that depend on it.
Reducing the dependencies on a modified class therefore reduces the probability that its update will impact other classes and so reduces the maximum potential burden that that class can impose. (This is little more than a re-statement of the, “Structured design,” paper.)
Reducing the maximum potential number of dependencies between all classes in a system therefore reduces the probability that an impact to a particular class will cause updates to other classes, and thus reduces the maximum potential burden of all updates.
Encapsulation, by reducing the maximum potential number of dependencies between all classes, therefore mitigates the weak form of the Principle of Burden. This is all covered by, “Encapsulation theory,” which attempts to mathematically prove such assertions, using potential coupling as the logical means of structuring a program.
Note, however, that when you ask, “Is encapsulation the key to all worthwhile code?” the answer must surely be: no. There is no single key to all worthwhile code. Encapsulation is, in certain circumstances, merely a tool to help improve the quality of code so that it may become, “Worthwhile.”
You also write that, “ … encapsulation can be a barrier to flexible extension of an object.” Yes it most certainly can: it is indeed designed to be a barrier against extending the design decisions of an object that are difficult or likely to change. This is not, however, thought to be a bad thing. An alternative approach would be to have all classes public and have a program express its maximum potential coupling; but then the weak form of the Principle of Burden states that updates will become increasingly costly; these are the costs against which barriers to extension are to be measured.
Finally, you make the interesting comparison between encapsulation and semantics, and that, in your opinion, OO’s semantics are its greater strength. I’m no semanticist either (I didn’t even know such a word existed before the good Mister Ramsey alluded to it in his comment) but I presume you mean, “Semantics,” in the sense of, “the meaning, or an interpretation of the meaning, of a word,” and very basically that a class with a, woof() method should be called a Dog.
There is great strength indeed in this semantics.
What is curious to me is that you pit semantics against encapsulation and look for a winner; I doubt you’ll find one.
In my opinion, there are two forces that motivate encapsulation: the semantic and the logical.
Semantic encapsulation merely means encapsulation based on the meaning of the nodes (to use the general term) encapsulated. So if I tell you that I have two packages, one called, 'animal,' and one called 'mineral,' and then give you three classes Dog, Cat and Goat and ask into which packages these classes should be encapsulated, then, given no other information, you would be perfectly right to claim that the semantics of the system would suggest that the three classes be encapsulated within the, 'animal,' package, rather than the, 'mineral.'
The other motivation for encapsulation, however, is logic, and particularly the study of potential coupling, mentioned above. Encapsulation theory actually provides equations for the number of packages that should be used to encapsulate a number of classes in order to minimise the potential coupling.
For me, encapsulation as a whole is the trade-off between this semantic and logical approach: I’ll allow the potential coupling of my programs to rise above the minimum if this makes the program semantically easier to understand; but enormous and wasteful levels of potential coupling will be a warning that my program needs to be re-structured no matter how semantically obvious it is.
(And if the good Mister Ramsey is still reading, could you or your semanticist friends give me a better word for the, “Semantics,” phase I’m using here? It would be good to use a more appropriate term.)
Regards,
Ed.