Suppose you have an application that consists of two layers:
- A: A data layer that stores all the data loaded from a database or from a file
- B: A layer that shows the data in a nice user interface, e.g. a graphical report
Now, data is changed in layer A. We have 2 approaches to make sure that the reports from layer B are correctly updated.
The first approach is the PUSH approach. Layer A notifies layer B via observers so layer B can update its reports.
There are several disadvantages in the PUSH approach:
- If data is changed multiple times (e.g. during load or in algorithms that change much data) the observers are executed many times. This can be solved by introducing a kind of buffering (prevent calling observers while you are still changing), but this can be very tricky and making the right buffering calls is often forgotten.
- If much data is changed, the observer calls may cause an overhead that is not acceptible in the application.
The other approach is the PULL approach. Layer A just remembers which data was changed and sends out no notifications (layer A is flagged dirty). After the action that was executed by the user (could be running an algorithm or loading a file or something else), we check all of our user interface components, and ask them to update themselves. In this case layer B is asked to update itself. First it will check if any of its underlying layers (layer A) is dirty. If it is, it will get the changes and update itself. If layer A was not dirty, the report knew it had nothing to do.
The best solution depends on the situation. In my situation, the PUSH approach seems much better.
The situation becomes much more difficult if we have more than 2 layers. Suppose we have the following 4 layers:
- A: A data layer that stores all the data loaded from a database or from a file
- B: A layer that uses the data layer (layer A), e.g. to filter the data from A using a complex filter function
- C: A layer that uses layer B, e.g. to aggregate data from layer B into smaller pieces of information
- D: A report that interprets the results of layer C and presents it in a nice graphical way to the user
In this case, PUSHING the changes will almost certainly introduce a much higher overhead.
On the other hand, PULLING the changes requires that:
- layer D has to call layer C to ask if it is dirty
- layer C has to call layer B to ask if it is dirty
- layer B has to call layer A to ask if it is dirty
If nothing has been changed the amount of calls to execute before you know that actually nothing has been changed and you don't have to do anything is rather big. It seems like the performance overhead that we try to avoid by not using the PUSH, is now coming back to use in the PULL approach because of the many calls to ask if anything is dirty.
Are there patterns that solve this kind of problem in a nice and high-performance (low overhead) way?