Hello everyone, I'm a software developer interested in information retrieval. Currently I'm working on my 3rd search engine project and am VERY frustrated about the amount of boilerplate code that is written again and again, with the same bugs, etc.
Basic search engine is a very simple beast that could be described in a formal language consisting of two "layers":
"Layer of primitives" (or axioms, kernel language - don't know how to name them). They consist of several sets (as a set of resources - files, websites), relations on sets (as 'site A links to site B') and simple operations as 'open stream to resource A', 'read record from stream', 'merge N streams', 'index set of records by field F', etc. Also, there is a lot of data conversion, as 'save stream in YAML format', 'load stream from XML format', etc.
"Application layer" - several very high-level operations that form a search engine lifecycle, as 'harvest new resources', 'crawl harvested resources', 'merge crawled resources to the database', 'index crawled resources', 'merge indexes', etc. Every one of this high-level operations could be expressed in the terms of "primitives" from 1.
Such a high-level representation could be easily tested, maybe even proved formally, and implemented (or code-generated) in the programming language of choice.
So, the question: does anybody design systems in this way - formally, rigorously ( maybe even at the level of algebra/group theory), in the strict top-down approach? What can I read to learn about ?