views:

289

answers:

12

Hi,

I have been working for two years in software industry. Some things that have puzzled me are as follows:

  1. There is lack of application of mathematics in current software industry.

    e.g.: When a mechanical engineer designs an electricity pole , he computes the stress on the foundation by using stress analysis techniques(read mathematical equations) to determine exactly what kind and what grade of steel should be used, but when a software developer deploys a web server application he just guesses on the estimated load on his server and leaves the rest on luck and god, there is nothing that he can use to simulate mathematically to answer his problem (my observation).

  2. Great softwares (wind tunnel simulators etc) and computing programs(like matlab etc) are there to simulate real world problems (because they have their mathematical equations) but we in software industry still are clueless about how much actual resources in terms of memory , computing resources, clock speed , RAM etc would be needed when our server side application would actually be deployed. we just keep on guessing about the solution and solve such problem's by more or less 'hit and trial' (my observation).

  3. Programming is done on API's, whether in c, C#, java etc. We are never able to exactly check the complexity of our code and hence efficiency because somewhere we are using an abstraction written by someone else whose source code we either don't have or we didn't have the time to check it.

    e.g. If I write a simple client server app in C# or java, I am never able to calculate beforehand how much the efficiency and complexity of this code is going to be or what would be the minimum this whole client server app will require (my observation).

  4. Load balancing and scalability analysis are just too vague and are merely solved by adding more nodes if requests on the server are increasing (my observation).

Please post answers to any of my above puzzling observations. Please post relevant references also.

I would be happy if someone proves me wrong and shows the right way.

Thanks in advance

Ashish

+1  A: 

The answer to most of these is in order to have meaningful measurements (and accepted equations, limits, tolerances etc) that you have in real-world engineering you first need a way of measuring what it is that you are looking at.

Most of these things simply can't be measured easily - Software complexity is a classic, what is "complex"? How do you look at source code and decide if it is complex or not? McCabe's Cyclomatic Complexity is the closest standard we have for this but it's still basically just counting branch instructions in methods.

Paolo
A: 

There is little math in software programs because the programs themselves are the equation. It is not possible to figure out the equation before it is actually run. Engineers use simple (and very complex) programs to simulate what happens in the real world. It is very difficult to simulate a simulator. additionally, many problems in computer science don't even have an answer mathematically: see traveling salesman.

Much of the mathematics is also built into languages and libraries. If you use a hash table to store data, you know to find any element can be done in constant time O(1), no matter how many elements are in the hash table. If you store it in a binary tree, it will take longer depending on the number of elements [0(n^2) if i remember correctly].

Byron Whitlock
The traveling salesman problem does have an answer, it's just that it's part of a very large class of problems we can't get answers for efficiently (and rather doubt we ever can). The halting problem is proved unanswerable.
David Thornley
TSP most definitely has an answer, it just cannot be computed in polynomial time by a deterministic turing machine (assuming P!=NP).Perhaps you were thinking of the halting problem? http://en.wikipedia.org/wiki/Halting_problem
jakber
+1  A: 

Think about the engineering sciences, they all have very well defined laws that are applicable to the design, and building of physical items, things like gravity, strength of materials, etc. Whereas in Computer science, there are not many well defined laws when it comes to building an application against.

I can think of many different ways to write a simple hello world program that would satisfy the requirment. However, if I have to build an electricity pole, I am severely constrained by the physical world, and the requirements of the pole.

Aaron M
A: 

The problem is that software talks with other software, written by humans. The engineering examples you describe deal with physical phenomenon, which are constant. If I develop an electrical simulator, everyone in the world can use it. If I develop a protocol X simulator for my server, it will help me, but probably won't be worth the work.

No one can design a system from scratch and people that write semi-common libraries generally have plenty of enhancements and extensions to work on rather than writing a simulator for their library.

If you want a network traffic simulator you can find one, but it will tell you little about your server load because the traffic won't be using the protocol your server understands. Every server is going to see completely different sets of traffic.

Pace
A: 

There is lack of application of mathematics in current software industry.

e.g.: When a mechanical engineer designs an electricity pole , he computes the stress on the foundation by using stress analysis techniques(read mathematical equations) to determine exactly what kind and what grade of steel should be used, but when a software developer deploys a web server application he just guesses on the estimated load on his server and leaves the rest on luck and god, there is nothing that he can use to simulate mathematically to answer his problem (my observation).

I wouldn't say that luck or god are always the basis for load estimation. Often realistic data can be had.

It's also not true that there are no mathematical techniques to answer the question. Operations research and queuing theory can be applied to good advantage.

The real problem is that mechanical engineering is based on laws of physics and a foundation of thousands of years worth of empirical and scientific investigation. Computer science is only as old as me. Computer science will be much further along by the time your children and grandchildren apply the best practices of their day.

duffymo
+1  A: 

Point by point

  1. An electricity pole has to withstand the weather, a load, corrosion etc and these can be quantified and modelled. I can't quantify my website launch success, or how my database will grow.

  2. Premature optimisation? Good enough is exactly that, fix it when needed. If you're a vendor, you've no idea what will be running your code in real life or how it's configured. Again you can't quantify it.

  3. Premature optimisation

  4. See point 1. I can add as needed.

Carrying on... even engineers bollix up. Collapsing bridges, blackout, car safety recalls, "wrong kind of snow" etc etc. Shall we change the question to "why don't engineers use more empirical observations?"

gbn
+5  A: 

I think there are a few reasons for this. One is that in many cases, simply getting the job done is more important than making it perform as well as possible. A lot of software that I write is stuff that will only be run on occasion on small data sets, or stuff where the performance implications are pretty trivial (it's a loop that does a fixed computation on each element, so it's trivially O(n)). For most of this software, it would be silly to spend time analyzing the running time in detail.

Another reason is that software is very easy to change later on. Once you've built a bridge, any fixes can be incredibly expensive, so it's good to be very sure of your design before you do it. In software, unless you've made a horrible architectural choice early on, you can generally find and optimize performance hot spots once you have some more real-world data about how it performs. In order to avoid those horrible architectural choices, you can generally do approximate, back-of-the-envelope calculations (make sure you're not using an O(2^n) algorithm on a large data set, and estimate within a factor of 10 or so how many resources you'll need for the heaviest load you expect). These do require some analysis, but usually it can be pretty quick and off the cuff.

And then there are cases in which you really, really do need to squeeze the ultimate performance out of a system. In these case, people frequently do actually sit down, work out the performance characteristics of the systems they are working with, and do very detailed analyses. See, for instance, Ulrich Drepper's very impressive paper What Every Programmer Should Know About Memory (pdf).

Brian Campbell
A: 

1) Most business logic is usually broken down into decision trees. This is the "equation" that should be proofed with unit tests. If you put in x then you should get y, I don't see any issue there.

2,3) Profiling can provide some insight as to where performance issues lie. For the most part you can't say that software will take x cycles because that will change over time (ie database becomes larger, OS starts going funky, etc). Bridges for instance require constant maintenance, you can't slap one up and expect it to last 50 years without spending time and money on it. Using libraries is like not trying to figure out pi every time you want to find the circumference of a circle. It has already been proven (and is cost effective) so there is no need to reinvent the wheel.

4) For the most part web applications scale well horizontally (multiple machines). Vertical (multithreading/multiprocess) scaling tends to be much more complex. Adding machines is usually relatively easy and cost effective and avoid some bottlenecks that become limited rather easily (disk I/O). Also load balancing can eliminate the possibility of one machine being a central point of failure.

It isn't exactly rocket science as you never know how many consumers will come to the serving line. Generally it is better to have too much capacity then to have errors, pissed of customers and someone (generally your boss) chewing your hide out.

envalid
A: 

An MIT EE grad would not have this problem ;)

Hamish Grubijan
A: 

My thoughts:

  1. Some people do actually apply math to estimate server load. The equations are very complex for many applications and many people resort to rules of thumb, guess and adjust or similar strategies. Some applications (real time applications with a high penalty for failure... weapons systems, powerplant control applications, avionics) carefully compute the required resources and ensure that they will be available at runtime.

  2. Same as 1.

  3. Engineers also use components provided by others, with a published interface. Think of electrical engineering. You don't usually care about the internals of a transistor, just it's interface and operating specifications. If you wanted to examine every component you use in all of it's complexity, you would be limited to what one single person can accomplish.

  4. I have written fairly complex algorithms that determine what to scale when based on various factors such as memory consumption, CPU load, and IO. However, the most efficient solution is sometimes to measure and adjust. This is especially true if the application is complex and evolves over time. The effort invested in modeling the application mathematically (and updating that model over time) may be more than the cost of lost efficiency by try and correct approaches. Eventually, I could envision a better understanding of the correlation between code and the environment it executes in could lead to systems that predict resource usage ahead of time. Since we don't have that today, many organizations load test code under a wide range of conditions to empirically gather that information.

Eric J.
A: 

Software engineering are very different from the typical fields of engineering. Where "normal" engineering are bound to the context of our physical universe and the laws in it we've identified, there's no such boundary in the software world.

Producing software are usually an attempt to mirror a subset of the real-life world into a virtual reality. Here we define the laws ourselves, by only picking the ones we need and by making them just as complex as we need. Because of this fundamental difference, you need to look at the problem-solving from a different perspective. We try to make abstractions to make complex parts less complex, just like we teach kids that yellow + blue = green, when it's really the wavelength of the light that bounces on the paper that changes.

Once in a while we are bound by different laws though. Stuff like Big-O, Test-coverage, complexity-measurements, UI-measurements and the likes are all models of mathematic laws. If you look into digital signal processing, realtime programming and functional programming, you'll often find that the programmers use equations to figure out a way to do what they want. - but these techniques aren't really (to some extend) useful to create a virtual domain, that can solve complex logic, branching and interact with a user.

cwap
I've often found that red+green=yellow.
outis
A: 

The reasons why wind tunnels, simulations, etc.. are needed in the engineering world is that it's much cheaper to build a scaled down prototype, than to build the full thing and then test it. Also, a failed test on a full scale bridge is destructive - you have to build a new one for each test.

In software, once you have a prototype that passes the requirements, you have the full-blown solution. there is no need to build the full-scale version. You should be running load simulations against your server apps before going live with them, but since loads are variable and often unpredictable, you're better off building the app to be able to scale to any size by adding more hardware than to target a certain load. Bridge builders have a given target load they need to handle. If they had a predicted usage of 10 cars at any given time, and then a year later the bridge's popularity soared to 1,000,000 cars per day, nobody would be surprised if it failed. But with web applications, that's the kind of scaling that has to happen.

Eclipse
It's not always possible to build a prototype or understand all the scaling issues. This is why computer simulations of physics problems are so popular. The software is widely available, and the hardware is powerful enough these days to make very complex solutions possible. One always needs to be aware of black swans, non-linear effects, and the limits of mathematics.
duffymo