views:

329

answers:

6

Hello everyone, I'm a software developer interested in information retrieval. Currently I'm working on my 3rd search engine project and am VERY frustrated about the amount of boilerplate code that is written again and again, with the same bugs, etc.

Basic search engine is a very simple beast that could be described in a formal language consisting of two "layers":

  1. "Layer of primitives" (or axioms, kernel language - don't know how to name them). They consist of several sets (as a set of resources - files, websites), relations on sets (as 'site A links to site B') and simple operations as 'open stream to resource A', 'read record from stream', 'merge N streams', 'index set of records by field F', etc. Also, there is a lot of data conversion, as 'save stream in YAML format', 'load stream from XML format', etc.

  2. "Application layer" - several very high-level operations that form a search engine lifecycle, as 'harvest new resources', 'crawl harvested resources', 'merge crawled resources to the database', 'index crawled resources', 'merge indexes', etc. Every one of this high-level operations could be expressed in the terms of "primitives" from 1.

Such a high-level representation could be easily tested, maybe even proved formally, and implemented (or code-generated) in the programming language of choice.

So, the question: does anybody design systems in this way - formally, rigorously ( maybe even at the level of algebra/group theory), in the strict top-down approach? What can I read to learn about ?

A: 

Hmm. Don't know if this helps, but have you looked at Z notation? I heard about it at uni but have not used it (I didn't take that module).

RichardOD
A: 

The short answer is, "Yes, to varying degrees."

Different organizations approach software development with varying degrees of rigor, but the concept of layered design, in which each layer deals with its responsibilities in terms of a very constrained, precisely-designed interface to the services provided by the next layer down, is well-established. I would point to the growing acceptance of test-driven development, dependency injection, and design to interfaces as evidence that these ideas are slowly becoming established as standard in software development.

However, software development is pursued at a wide variety of scales and for a wide variety of purposes. Just as the level of precision engineering increases in physical fabrication as scale and complexity increase (e.g. jet engine manufacturer vs. picture framer), some software developers deal with systems whose performance and scale of use are small enough that they can tolerate lack of precision, or even long-lived defects (e.g. a typical web developer vs. a developer working on avionics or embedded medical devices).

My observation is that precision and strict layering have often been regarded as costs to be born only when the consequences of defects are sufficiently high. But I see that slowly changing for the better, at least in the development of mission-critical systems that work at Internet scale.

joel.neely
A: 

For most of our projects we have an architecture based on a standard 3 layer architecture:

  • UI : Tested Manually
  • Business : Tested with Mocking
  • Proxy / Data Access : Tested with integration tests

To learn more about architectural patterns see http://en.wikipedia.org/wiki/Architectural%5Fpattern%5F%28computer%5Fscience)

Shiraz Bhaiji
+2  A: 

Critical systems (nuclear power plants, airplanes, train control systems, ...) are developed in a top-down approach similar to the one you are looking for. But the upper levels are not even programmatic at all. It's not about kernel layer and application layer, it is about high-level design refined into components, sub-components, with precise specifications at each level.

The specifications can be formal (intended to be verified automatically once the specified component is available) or not (intended to be verified by tests, code reviews or whatever method is appropriate). To be frank, in 2009, they aren't formal most of the time, although the trend is clearly to move in that direction.

Since you mention formal approaches in your question's tags, you must be interested in the topic, but it's a niche at the moment. I especially don't see how these methods could be applied economically to search engine projects. Anyway, if you want to learn more about how these methods are applied in fields where they work, here are a few links:

Someone mentioned Z: Z is the specification language, the framework in which you refine and refine specifications until they become executable is called B. You might also be interested in Alloy. And lastly, there are formal specification languages for existing programming languages. The trend started with JML for Java, and inspired many others. I work in a group of people who defined such a specification language for C, ACSL.

Pascal Cuoq
A: 

I would challenge your assumption that reusable code needs to be written in such way.

I have seen workplaces with systems designed with a reuse code goal, that end up reusing v. few & have extra complexity all around.

I find sticking to principles in SOLID, doing TDD, having DRY, YAGNI and KISS in mind go a long way to achieving a reasonable level of reuse.

The operations you mentioned are perfect examples of different responsibilities that shouldn't all end in the same class:

open stream to resource A', 'read record from stream', 'merge N streams', 'index set of records by field F', etc. Also, there is a lot of data conversion, as 'save stream in YAML format', 'load stream from XML format', etc.

I recommend you this ebook on solid.

On trying to design it top down, careful on having repeated thoughts on 'what if x', 'what if y' ... as you fall too easy on adding plenty of stuff that you don't need at the end - or are not modeled in a reusable way (even if that was the reason you added it ...).

eglasius
A: 

I recommend looking at IEEE-1471.

Ray