views:

68

answers:

5

Why does being thread safe matter in a web app? Pylons (Python web framework) uses a global application variable which is not thread safe. Does this matter? Is it only a problem if I intend on using multi-threading? Or, does it mean that one user might not have updated state if another user... I'm just confusing myself. What's so important with this?

+2  A: 

Threading errors can lead to serious and subtle problems.

Say your system has 10 members. One more user signs up to your system and the application adds him to the roster and increments the count of members; "simultaneously", another user quits and the application removes him from the roster and decrements the count of members.

If you don't handling threading properly, your member count (which should be 10) could easily be nine, 10, or 11, and you'll never be able to reproduce the bug.

So be careful.

Malvolio
@Malvolio - Does your scenario assume that I would be storing state for multiple requesting web users in memory, vs in the database? If so, that might not be such a huge headache because I tend to lean toward the slower but easier way by handling this all through the DB.
orokusaki
@Malvolio - can you speak on Pylons regarding this matter?
orokusaki
I was only talking about threading problems in the abstract. Doing all the work in the (presumably threadsafe) database, doesn't solve the problem. For example, if you increment the member count unwisely -- read count from database, increment count, write the count to the database -- you will very likely experience a threading problem.
Malvolio
+1  A: 

You should care about thread safety. E.g in java you write a servlet that provides some functionality. The container will deploy an instance of your servlet, and as HTTP requests arrive from clients, over different TCP connections, each request is handled by a separate thread which in turn will call your servlet. As a result, you will have your servlet being call from multiple threads. So if it is not thread-safe, then erroneous result will be returned to the user, due to data corruption of access to shared data by threads.

@user384706 - Do you happen to know what the equivalent of a servlet is in Python? Is a WSGI application the basic equivalent?
orokusaki
@orokusaki:I do not know Python so I do not know. But usually, all frameworks specify if they are thread safe. E.g. struts2 specify that they are thread-safe. I.e. for each client request, each thread assigned to each new connection, uses a new instance of the implementation class. For servlets, instances are shared among connections i.e. threads. Isn't that specified in Python? If it is not, then I think it would be better to assume, thread-unsafe and synchronize access to shared data. Just analyse the code to make sure the performance is ok.
+1  A: 

It really depends on the application framework (which I know nothing about in this case) and how the web server handles it. Obviously, any good webserver is going to be responding to multiple requests simultaneously, so it will be operating with multiple threads. That web server may dispatch to a single instance of your application code for all of these requests, or it may spawn multiple instances of your web application and never use a given instance concurrently.

Even if the app server does use separate instances, your application will probably have some shared state--say, a database with a list of users. In that case, you need to make sure that state can be accessed safely from multiple threads/instances of your web app.

Then, of course, there is the case where you use threading explicitly in your application. In that case, the answer is obvious.

Tim Yates
+1  A: 

Your Web Application is almost always multithreading. Even though you might not use threads explicitly. So, to answer your questions: it's very important.

How can this happen? Usually, Apache (or IIS) will serve several request simultaneously, calling multiple times from multiple threads your python programs. So you need to consider that your programs run in multiple threads concurrently and act accordingly.

Pablo Santa Cruz
@Pablo - But, I don't understand. how does this affect me? State is stored in my DB, so why does it matter? What change would occur in one thread that could mess up another thread's run-time? That's what I don't get. Also, can you speak on Pylons at all?
orokusaki
@crokusaki you seem to be confusing persistent state and request specific state. Your db can handle persistent state but cannot handle request specific state. Say you have a variable called zipcode that gets initialized to the zipcode entered by a user in a form. If this is thread unsafe (i.e. if only one instance of this variable is stored across multiple threads) then there is a potential that you may have the zip code of a second user corrupting this variable in which case your whole logic would get messed up. Might deliver one user's stuff to another user!
raja kolluru
@raja - ok. So basically, I might have a web framework that takes a settings variable and applies it to the framework's state, instead of binding some sort of thread safe state to the single request. Then, one user comes and sees the effects of another user's settings?
orokusaki
@orokusaki: a trivial example: your web app returns the number of client requests. So you have a variable count that increments by 1 per request. If there is no synchronization, 2 threads will read the same value X increment by 1 and your web app returns the number of requests off by 1. In a more series case, if you use data that are not synchronized, that do some processing the results could be undefined. E.g you could have some kind of calculations before storing in the DB, and because the calculations are wrong due to data corruption by threads, you end up with wrong entries in DB
+1  A: 

(This was too long to add a comment to the other fine answers.)

Concurrency problems (read: multiple access to shared state) is a super-set of threading problems. The (concurrency problems) can easily exist at an "above thread" level such as a process/server level (the global variable in the case you mention above is process-unique value, which in turn can lead to an inconsistent view/state if there are multiple processes).

Care must be taken to analyze the data consistency requirements and then implement the software to fulfill those requirements. I would always err on the side of safe, and only degrade in carefully analyzed areas where it is acceptable.

However, note that CPython runs only one thread context for Python code execution (to get true concurrent threads you need to write/use C extensions), so, while you can get a form of race condition upon expected data, you won't get (all) the same kind of partial-write scenarios and such that may plague C/C++ programs. But, once again. Err on the side of a consistent view.

There are a number of various existing methods of making access to a global atomic -- across threads or processes. Use them.

pst