views:

186

answers:

5

Dear StackOverflow,

What are some common best practices in procedure (or function, module, etc.) design for balancing the desire for information hiding and an appropriate level of abstraction in the procedure's interface with the problems inherent in introducing hidding dependencies?

To be more concrete, suppose I code a procedure called getEmployeePhoneNbr(employeeId). Internally, the procedure is implemented by querying a database table keyed off of employeeId. I want to hide those implementation details, but now the procedure depends upon an external file, which hinders its use if the environment changes.

The same situation would occur any time a procedure uses an external resource - file, database, whatever. It feels wrong somehow to hard-code the use of that resource within the procedure, but I'm not sure what the alternative is.

Please note that I'm not working in an object-oriented language; to the extent possible, I'd be most interested in responses that were broadly applicable across any type of language.

Thanks, Matt

A: 

You could provide some kind of context/environment object. Say:

type Environment = record
      DatabaseHandle: ...;
      ...
   end;

   Employee = record
      ID: integer;
      Name: string;
      ...
   end;


function OpenEnvironment (var Env: Environment): boolean;
begin
   ...
end;

procedure CloseEnvironment (var Env: Environment);
begin
   ...
end;

function GetEmployeeById (var Env: Environment; ID: integer; var Employee: Employee): boolean;
begin
   ... load employee using the data source contained in environment ...
end;

(Pseudo-Pascal). The advantage is, that you could use the Environment structure to store, say, extended error information and other global state, too, this way avoiding the PITA which is the Unixish errno or Window's GetLastError. Another advantage of this approach is, that all your APIs become re-entrant, and by using a dedicated environment per thread, thread-safe as a consequence.

The drawback of this approach is, that you will have to pass an additional argument to all of your APIs.

Dirk
+1  A: 

This is a very difficult issue to resolve, whether your implementation language is object oriented or not (and in any case object methodologies can usually be applied regardless of whether to programming language supports them as a language construct, so I have described my solution in terms of objects)

What you would like to be able to do is treat all data storage equivilantly. In reality this is almost impossible and you must choose a paradigm and accept it's limts. For instance it is possible to base the design of your abstraction upon an RDBMS paradigm (connect/query/fetch) and attempt to encapsulate access to files withion the same interface.

An approach I have used with success is to avoid embedding the retrieval of data within (in your case) the Employee "object" as this creates a coupling that is to close between the abstraction of the Employee within the program and the storage and retrival of it's data.

Instead I create a seperate object, responsible for retrieving the data to construct the Employee object, and in turn construct the Employee object from that data. I can now construct an Employee from any data source provided I can translate the data into an appropriately generic structure. (I have the advantage of language support for associative arrays, which simplifies the process of passing tuples around considerably, you may have trouble if your development language makes it difficult or impossible to do this).

This also make the application easier to test, since I can construct the Employee "object" directly within my unit test without having to worry about creating the data source (or whether the data that was there last time is still there). In a complex design this setup and tear down can account for the majority of the test code. In addition, should the need arise to create 1000 Employee "objects" I can re-use my code without having to query my datasource (file, db, card index etc) 1000 times (in other words it neatly solves the famous ORM N+1 query problem).

So to summarise, seperate data retrival from business logic entirely as the hidden dependency you describe has some very nasty pitfalls. IMHO it is an anti-pattern to encapsulate retrieval of specific data within the construction of an "object" or within a function to retrieve a property from some stored data.

A: 

You may want to use a three layer approach here, your first layer is you client, the one consuming getEmployeePhoneNbr(employeeId)... the second layer is your data access layer, and the third layer would be the data implementation layer which will be used by your data access layer to access the concrete source of information.

The data implementation layer.

This layer contains:

  1. A data structure that represent the location of a resource that can be accessed by the data layer.
  2. An API to create a new structure and it's correspondent functions to configure it.

The data access layer

Contains:

  1. A pointer to the data structure to be used as source of data.
  2. A public simple API with all the calls you have to access the data, as getEmployeePhoneNbr (employeeId), getEmployeeName (employeeId) .... All this calls will use internally the pointer to the data structure to access the specific data

Using this approach you will only will have to take care of providing with the right data implementation structure to your data access layer, so if it changes, you will only need to change it in one place.

Alberto Gutierrez
A: 

The kind of problem you have is usually solved by using the dependancy inversion principle (aka DIP). The original article can be found here.

The article is mainly OO but you can apply in an imperative language too (you can do OO with imperative language it is just harder).

The principle is that it is better to give a client object a reference to an object that do some needed processing (database access for instance) than to code or aggregate this object into the client object.

At a function level you can translate it to give a high level function low level data / functions.

The best way in non OO language is to pass a struct or a function pointer that defines the data / functions used by the higher level function.

neuro
A: 

Put the resource dependency in a lookup function. If a number of resources are related I would create a module that has simple functions for retrieving them. I personally avoid handing around such references when I can avoid. The code on the way has no business knowing or using them.

Instead of:

getEmployeePhoneNbr(employeeId)
    dbName = "employeedb"
    ... SQL, logic, etc.

Or:

getEmployeePhoneNbr(employeeId, dbName)
    ... SQL, logic, etc.

I would do the following:

getEmployeePhoneNbr(employeeId)
    dbName = getEmployeeDbName()
    ... SQL, logic, etc.

This way you can change getEmployeeDbName() and every dependent function and module will benefit.

Alain O'Dea