tags:

views:

366

answers:

7

Where I work we build and distribute a library and a couple complex programs built on that library. All code is written in C and is available on most 'standard' systems like Windows, Linux, Aix, Solaris, Darwin.

I started in the QA department and while running tests recently I have been reminded several times that I need to remember to set the file descriptor limits and default stack sizes higher or bad things will happen. This is particularly the case with Solaris and now Darwin.

Now this is very strange to me because I am a believer in 0 required environment fiddling to make a product work. So I am wondering if there are times where this sort of requirement is a necessary evil, or if we are doing something wrong.

Edit:

Great comments that describe the problem and a little background. However I do not believe I worded the question well enough. Currently, we require customers, and hence, us the testers, to set these limits before running our code. We do not do this programatically. And this is not a situation where they MIGHT run out, under normal load our programs WILL run out and seg fault. So rewording the question, is requiring the customer to change these ulimit values to run our software to be expected on some platforms, ie, Solaris, Aix, or are we as a company making it to difficult for these users to get going?

Bounty: I added a bounty to hopefully get a little more information on what other companies are doing to manage these limits. Can you set these pragmatically? Should we? Should our programs even be hitting these limits or could this be a sign that things might be a bit messy under the covers? That is really what I want to know, as a perfectionist a seemingly dirty program really bugs me.

+1  A: 

On Darwin, the default soft limit on the number of open files is 256; the default hard limit is unlimited.

AFAICR, on Solaris, the default soft limit on the number of open files is 16384 and the hard limit is 32768.

For stack sizes, Darwin has soft/hard limits of 8192/65536 KB. I forget what the limit is on Solaris (and my Solaris machine is unavailable - power outages in Poughkeepsie, NY mean I can't get to the VPN to access the machine in Kansas from my home in California), but it is substantial.

I would not worry about the hard limits. If I thought the library might run out of 256 file descriptors, I'd increase the soft limit on Darwin; I would probably not bother on Solaris.

Similar limits apply on Linux and AIX. I can't answer for Windows.

Sad story: a few years ago now, I removed the code that changed the maximum file size limit in a program - because it had not been changed from the days when 2 MB was a big file (and some systems had a soft limit of just 0.5 MB). Once upon a decade and some ago, it actually increased the limit; when it was removed, it was annoying because it reduced the limit. Tempus fugit and all that.


On SuSE Linux (SLES 10), the open files limits are 4096/4096, and the stack limits are 8192/unlimited.

Jonathan Leffler
+1  A: 

The short answer is: it's normal, but not inflexible. Of course, limits are in place to prevent rogue processes or users from starving the system of resources. Desktop systems will be less restrictive than server systems but still have certain limits (e.g. filehandles.)

This is not to say that limits cannot be altered in persistent/reproduceable manners, either by the user at the user's discretion (e.g. by adding the relevant ulimit calls in .profile) or programatically from within programs/libraries which know with certitude that they will require large amounts of filehandles (e.g. setsysinfo(SSI_FD_NEWMAX,...)), stack (provided at pthread creation time), etc.

vladr
+7  A: 

If you need to change these values in order to get your QA tests to run, then that is not too much of a problem. However, requiring a customer to do this in order for the program to run should (IMHO) be avoided. If nothing else, create a wrapper script that sets these values and launches the application so that users will still have a one-click application launch. Setting these from within the program would be the preferable method, however. At the very least, have the program check the limits when it is launched and (cleanly) error out early if the limits are too low.

If a software developer told me that I had to mess with my stack and descriptor limits to get their program to run, it would change my perception of the software. It would make me wonder "why do they need to exceed the system limits that are apparently acceptable for every other piece of software I have?". This may or may not be a valid concern, but being asked to do something that (to many) can seem hackish doesn't have the same professional edge as an program that you just launch and go.

This problem seems even worse when you say "this is not a situation where they MIGHT run out, under normal load our programs WILL run out and seg fault". A program exceeding these limits is one thing, but a program that doesn't gracefully handle the error conditions resulting from exceeding these limits is quite another. If you hit the file handle limit and attempt to open a file, you should get an error indicating that you have too many files open. This shouldn't cause a program crash in a well-designed program. It may be more difficult to detect stack usage issues, but running out of file descriptors should never cause a crash.

You don't give much details about what type of program this is, but I would argue that it's not safe to assume that users of your program will necessarily have adequate permissions to change these values. In any case, it's probably also unsafe to assume that nothing else might change these values while your program is running without the user's knowledge.

While there are always exceptions, I would say that in general a program that exceeds these limits needs to have its code re-examined. The limits are there for a reason, and pretty much every other piece of software on your system works within those limits with no problems. Do you really need that many files open at the same time, or would it be cleaner to open a few files, process them, close them, and open a few more? Is your library/program trying to do too much in one big bundle, or would it be better to break it into smaller, independent parts that work together? Are you exceeding your stack limits because you are using a deeply-recursive algorithm that could be re-written in a non-recursive manner? There are likely many ways in which the library and program in question can be improved in order to ease the need to alter the system resource limits.

bta
You basically hit on every single one of my reasons for squirming each time I type out ulimit x y before running something. Glad that I am not the only one. As you say, in a well designed program these limits should never be a problem, the engineers at my company counter this by saying that these apps need to be high performance and spending time checking return values is a waste of time. If the program segfaults its because the user has an incorrect setup... That is at least why they refuse to address the code cleanliness as I suggest.
Charles
"these apps need to be high performance and spending time checking return values is a waste of time" I find that statement patently absurd. I sometimes hear the same thing from embedded developers I work with, and even in the resource-constrained embedded world this statement is rarely true. No program is high-performing if it crashes. Error checking does add a small bit of overhead, but spending the time to optimize your code will give you a performance boost that will more than make up for it. There is no need to make the user jump through hoops because you don't want to fix your code.
bta
"If the program segfaults its because the user has an incorrect setup" Don't assume what the user's setup looks like. They may not be able to control their setup. System limits are partially there for security reasons (ensuring a process gone haywire doesn't crash the system). Forcing users to expand those limits also decreases their system reliability in that respect. Please tell me that this library/application isn't used in medical equipment or any safety-critical systems....
bta
Wow you hit the nail on the head, these ARE mostly embedded C developers I work with. I agree completely with your points and I have tried to bring these things up many times to no avail. Some people just do not understand that even one broken window can lead to disaster. This being my first job with other programmers since college I am glad to hear that there are greener fields out there.ps. Our library is a back office system for communication between applications and we have no potentially disastrous customers yet.
Charles
Every time I have seen this mentality, it was the result of program management wanting work done *now* instead of being done *correctly*. Since optimizing code might not have an immediately visible effect to a user, developers are pressured to leave working code alone (no matter how horrendous it might be) and instead work on new development or bug fixes. The longer you wait before you optimize it the longer it typically takes to optimize, so the problem can snowball on you (again, I don't know anything about your particular situation, that's just what I have observed).
bta
+1  A: 

As you have to support a large number of different systems i would consider it wise to setup certain known to be good values for system limits/resources because the default values can differ wildly between systems.

The default size for pthread stacks is for example such a case. I recently had to find out that the default on HPUX11.31 is 256KB(!) which isn't very reasonable at least for our applications.

Setting up well defined values increases the portability of an application as you can be sure that there are X file descriptors, a stack size of Y, ... on every platform and that things are not just working by good luck.

I have the tendency to setup such limits from within the program itself as the user has less things to screw up (someone always tries to run the binary without the wrapper script). To optionally allow for runtime customization environment variables could be used to override the defaults (still enforcing the minimum limits).

Frank Meerkötter
I like the idea of the app setting known good values, that would probably be my best choice solution if it were up to me.
Charles
A: 

A small tip : If you plan to run the application over 64 bit processor, then please be careful about setting stacksize unlimited. Which in 64 Bit Linux system give -1 as stacksize.

Thanks Shyam

Shyam Sunder Verma
+1  A: 

Lets look at it this way. It is not very customer friendly to require customers to set these limits. As detailed in the other answers, you are most likely to hit soft limits and these can be changed. So change them automatically, if necessary in a script that starts the actual application (you can even write it so that it fails if the hard limits are too low and produce a nice error message instead of a segfault).

That's the practical part of it. Without knowing what the application does I'm a bit at a guess, but in most cases you should not be anywhere close to hitting any of the default limits of (even less progressive) operating systems. Assuming the system is not a server that is bombarded with requests (hence the large amount of file/socket handles used) it is probably a sign of sloppy programming. Based on experience with programmers, I would guess that file descriptors are left open for files that are only read/written once, or that the system keeps open a file descriptor on a file that is only sporadically changed/read.

Concerning stack sizes, that can mean two things. The standard cause of a program running out of stack is excessive recursion (or unbounded recursion), which is an error condition that the limits actually are designed to address. The second thing is that some big (probably configuration) structures are allocated on the stack that should be allocated in heap memory. It might even be worse and those huge structures are being passed around by value (instead of reference) and that would mean a big hit on available (wasted) stack space as well as a big performance penalty.

Paul de Vrieze
A: 

Perhaps you could add whatever is appropriate to the start script, like 'ulimit -n -S 4096'.

But having worked with Solaris since 2.6, its not unusual to modify rlim_fd_cur and rlim_fd_max in /etc/system permanently. In older versions of Solaris, they're just too low for some workloads, like running webservers.

rama