We're building tools to mine information from the web. We have several pieces, such as
- Crawl data from the web
- Extract information based on templates & business rules
- Parse results into database
- Apply normalization & filtering rules
- Etc, etc.
The problem is troubleshooting issues & having a good "high-level picture" of what's happening at each stage.
What techniques have helped you understand and manage complex processes?
- Use workflow tools like Windows Workflow foundation
- Encapsulate separate functions into command-line tools & use scripting tools to link them together
- Write a Domain-Specific Language (DSL) to specify what order things should happen at a higher level.
Just curious how you get a handle on a system with many interacting components. We'd like document/understand how the system works at a higher level than tracing through the source code.