views:

222

answers:

4

I have a program that does a limited form of multithreading. It is written in Delphi, and uses libmysql.dll (the C API) to access a MySQL server. The program must process a long list of records, taking ~0.1s per record. Think of it as one big loop. All database access is done by worker threads which either prefetch the next records or write results, so the main thread doesn't have to wait.

At the top of this loop, we first wait for the prefetch thread, get the results, then have the prefetch thread execute the query for the next record. The idea being that the prefetch thread will send the query immediately, and wait for results while the main thread completes the loop.

It often does work that way. But note there's nothing to ensure that the prefetch thread runs right away. I found that often the query was not sent until the main thread looped around and started waiting for the prefetch.

I sort-of fixed that by calling sleep(0) right after launching the prefetch thread. This way the main thread surrenders the remainder of it's time slice, hoping that the prefetch thread will now run, sending the query. Then that thread will sleep while waiting, which allows the main thread to run again.
Of course, there's plenty more threads running in the OS, but this did actually work to some extent.

What I really want to happen is for the main thread to send the query, and then have the worker thread wait for the results. Using libmysql.dll I call

result := mysql_query(p.SqlCon,pChar(p.query));

in the worker thread. Instead, I'd like to have the main thread call something like

mysql_threadedquery(p.SqlCon,pChar(p.query),thread);

which would hand off the task as soon as the data went out.

Anybody know of anything like that?

This is really a scheduling problem, so I could try is lauching the prefetch thread at a higher priority, then have it reduce its priority after the query is sent. But again, I don't have any mysql call that separates sending the query from receiving the results.

Maybe it's in there and I just don't know about it. Enlighten me, please.

Added Question:

Does anyone think this problem would be solved by running the prefetch thread at a higher priority than the main thread? The idea is that the prefetch would immediately preempt the main thread and send the query. Then it would sleep waiting for the server reply. Meanwhile the main thread would run.

Added: Details of current implementation

This program performs calculations on data contained in a MySQL DB. There are 33M items with more added every second. The program runs continuously, processing new items, and sometimes re-analyzing old items. It gets a list of items to analyze from a table, so at the beginning of a pass (current item) it knows the next item ID it will need.

As each item is independent, this is a perfect target for multiprocessing. The easiest way to do this is to run multiple instances of the program on multiple machines. The program is highly optimized via profiling, rewrites, and algorithm redesign. Still, a single instance utilizes 100% of a CPU core when not data-starved. I run 4-8 copies on two quad-core workstations. But at this rate they must spend time waiting on the MySQL server. (Optimization of the Server/DB schema is another topic.)

I implemented multi-threading in the process solely to avoid blocking on the SQL calls. That's why I called this "limited multi-threading". A worker thread has one task: send a command and wait for results. (OK, two tasks.)

It turns out there are 6 blocking tasks associated with 6 tables. Two of these read data and the other 4 write results. These are similar enough to be defined by a common Task structure. A pointer to this Task is passed to a threadpool manager which assigns a thread to do the work. The main thread can check the task status through the Task structure.

This makes the main thread code very simple. When it needs to perform Task1, it waits for Task1 to be not busy, puts the SQL command in Task1 and hands it off. When Task1 is no longer busy, it contains the results (if any).

The 4 tasks that write results are trivial. The main thread has a Task write records while it goes on to the next item. When done with that item it makes sure the previous write finished before starting another.

The 2 reading threads are less trivial. Nothing would be gained by passing the read to a thread and then waiting for the results. Instead, these tasks prefetch data for the next item. So the main thread, coming to this blocking tasks, checks if the prefetch is done; Waits if necessary for the prefetch to finish, then takes the data from the Task. Finally, it reissues the Task with the NEXT Item ID.

The idea is for the prefetch task to immediately issue the query and wait for the MySQL server. Then the main thread can process the current Item and by the time it starts on the next Item the data it needs is in the prefetch Task.

So the threading, a thread pool, the synchronization, data structures, etc. are all done. And that all works. What I'm left with is a Scheduling Problem.

The Scheduling Problem is this: All the speed gain is in processing the current Item while the server is fetching the next Item. We issue the prefetch task before processing the current item, but how do we guarantee that it starts? The OS scheduler does not know that it's important for the prefetch task to issue the query right away, and then it will do nothing but wait.

The OS scheduler is trying to be "fair" and allow each task to run for an assigned time slice. My worst case is this: The main thread receives its slice and issues a prefetch, then finishes the current item and must wait for the next item. Waiting releases the rest of its time slice, so the scheduler starts the prefetch thread, which issues the query and then waits. Now both threads are waiting. When the server signals the query is done the prefetch thread restarts, and requests the Results (dataset) then sleeps. When the server provides the results the prefetch thread awakes, marks the Task Done and terminates. Finally, the main thread restarts and takes the data from the finished Task.

To avoid this worst-case scheduling I need some way to ensure that the prefetch query is issued before the main thread goes on with the current item. So far I've thought of three ways to do that:

  1. Right after issuing the prefetch task, the main thread calls Sleep(0). This should relinquish the rest of its time slice. I then hope that the scheduler runs the prefetch thread, which will issue the query and then wait. Then the scheduler should restart the main thread (I hope.) As bad as it sounds, this actually works better than nothing.

  2. I could possibly issue the prefetch thread at a higher priority than the main thread. That should cause the scheduler to run it right away, even if it must preempt the main thread. It may also have undesirable effects. It seems unnatural for a background worker thread to get a higher priority.

  3. I could possibly issue the query asynchronously. That is, separate sending the query from receiving the results. That way I could have the main thread send the prefetch using mysql_send_query (non blocking) and go on with the current item. Then when it needed the next item it would call mysql_read_query, which would block until the data is available.

Note that solution 3 does not even use a worker thread. This looks like the best answer, but requires a rewrite of some low-level code. I'm currently looking for examples of such asynchronous client-server access.

I'd also like any experienced opinions on these approaches. Have I missed anything, or am I doing anything wrong? Please note that this is all working code. I'm not asking how to do it, but how to do it better/faster.

A: 

You just have to use the standard Thread synchronization mechanism of the Delphi threading.

Check your IDE help for TEvent class and its associated methods.

A. Bouchez
+1  A: 

I don't know any database access layer that permits this.

The reason is that each thread has its own "thread local storage" (The threadvar keyword in Delphi, other languages have equivalents, it is used in a lot of frameworks).
When you start things on one thread, and continue it on another, then you get these local storages mixed up causing all sorts of havoc.

The best you can do is this:

  1. pass the query and parameters to the thread that will handle this (use the standard Delphi thread synchronization mechanisms for this)
  2. have the actual query thread perform the query
  3. return the results to the main thread (use the standard Delphi thread synchronization mechanisms for this)

The answers to this question explains thread synchronization in more detail.

Edit: (on presumed slowness of starting something in an other thread)

"Right away" is a relative term: it depends in how you do your thread synchronization and can be very very fast (i.e. less than a millisecond).
Creating a new thread might take some time.
The solution is to have a threadpool of worker threads that is big enough to service a reasonable amount of requests in an efficient manner.
That way, if the system is not yet too busy, you will have a worker thread ready to start servicing your request almost immediately.

I have done this (even cross process) in a big audio application that required low latency response, and it works like a charm.
The audio server process runs at high priority waiting for requests. When it is idle, it doesn't consume CPU, but when it receives a request it responds really fast.

The answers to this question on changes with big improvements and this question on cross thread communication provide some interesting tips on how to get this asynchronous behaviour working.
Look for the words AsyncCalls, OmniThread and thread.

--jeroen

Jeroen Pluimers
That's how I did it several versions ago. It was too slow. In any case, it doesn't solve the problem -- which is to guarantee that the thread sends the query right away.
Guy Gordon
@Guy: see my edit.
Jeroen Pluimers
Thanks for the clarification. I'm going to expand my explanation of how I do it now by editing the question above. (These comment boxes are small.) Comments and/or criticisms are welcome.
Guy Gordon
+1  A: 

I'm putting in a second answer, for your second part of the question: your Scheduling Problem This makes it easier to distinguish both answers.

First of all, you should read Consequences of the scheduling algorithm: Sleeping doesn't always help which is part of Raymond Chen's blog "The Old New Thing".
Sleeping versus polling is also good reading.
Basically all these make good reading.

If I understand your Scheduling Problem correctly, you have 3 kinds of threads:

  1. Main Thread: makes sure the Fetch Threads always have work to do
  2. Fetch Threads: (database bound) fetch data for the Processing Threads
  3. Processing Threads: (CPU bound) process fetched data

The only way to keep 3 running is to have 2 fetch as much data as they can.
The only way to keep 2 fetching, is to have 1 provide them enough entries to fetch.

You can use queues to communicate data between 1 and 2 and between 2 and 3.

Your problem now is two-fold:

  • finding the balance between the number of threads in category 2 and 3
  • making sure that 2 always have work to do

I think you have solved the former.
The latter comes down to making sure the queue between 1 and 2 is never empty.

A few tricks:

  • You can use Sleep(1) (see the blog article) as a simple way to "force" 2 to run
  • Never let the treads exit their execute: creating and destroying threads is expensive
  • choose your synchronization objects (often called IPC objects) carefully (Kudzu has a nice article on them)

--jeroen

Jeroen Pluimers
Guy Gordon
Guy Gordon
@Guy: I'm glad I could be of help. Pity so few people took the time to read your excellent question, as it should have been upvoted more. Maybe you should change the title of your question to focus more on the "scheduling" part of it.
Jeroen Pluimers
+2  A: 

Still, a single instance utilizes 100% of a CPU core when not data-starved. I run 4-8 copies on two quad-core workstations.

I have a conceptual problem here. In your situation I would either create a multi-process solution, with each process doing everything in its single thread, or I would create a multi-threaded solution that is limited to a single instance on any particular machine. Once you decide to work with multiple threads and accept the added complexity and probability of hard-to-fix bugs, then you should make maximum use of them. Using a single process with multiple threads allows you to employ varying numbers of threads for reading from and writing to the database and to process your data. The number of threads may even change during the runtime of your program, and the ratio of database and processing threads may too. This kind of dynamic partitioning of the work is only possible if you can control all threads from a single point in the program, which isn't possible with multiple processes.

I implemented multi-threading in the process solely to avoid blocking on the SQL calls.

With multiple processes there wouldn't be a real need to do so. If your processes are I/O-bound some of the time they don't consume CPU resources, so you probably simply need to run more of them than your machine has cores. But then you have the problem to know how many processes to spawn, and that may again change over time if the machine does other work too. A threaded solution in a single process can be made adaptable to a changing environment in a relatively simple way.

So the threading, a thread pool, the synchronization, data structures, etc. are all done. And that all works. What I'm left with is a Scheduling Problem.

Which you should leave to the OS. Simply have a single process with the necessary pooled threads. Something like the following:

  • A number of threads reads records from the database and adds them to a producer-consumer queue with an upper bound, which is somewhere between N and 2*N where N is the number of processor cores in the system. These threads will block on the full queue, and they can have increased priority, so that they will be scheduled to run as soon as the queue has more room and they become unblocked. Since they will be blocked on I/O most of the time their higher priority shouldn't be a problem.
    I don't know what that number of threads is, you would need to measure.

  • A number of processing threads, probably one per processor core in the system. They will take work items from the queue mentioned in the previous point, on block on that queue if it's empty. Processed work items should go to another queue.

  • A number of threads that take processed work items from the second queue and write data back to the database. There should probably an upper bound for the second queue as well, to make it so that a failure to write processed data back to the database will not cause processed data to pile up and fill all your process memory space.

The number of threads needs to be determined, but all scheduling will be performed by the OS scheduler. The key is to have enough threads to utilise all CPU cores, and the necessary number of auxiliary threads to keep them busy and deal with their outputs. If these threads come from pools you are free to adjust their numbers at runtime too.

The Omni Thread Library has a solution for tasks, task pools, producer consumer queues and everything else you would need to implement this. Otherwise you can write your own queues using mutexes.

The Scheduling Problem is this: All the speed gain is in processing the current Item while the server is fetching the next Item. We issue the prefetch task before processing the current item, but how do we guarantee that it starts?

By giving it a higher priority.

The OS scheduler does not know that it's important for the prefetch task to issue the query right away

It will know if the thread has a higher priority.

The OS scheduler is trying to be "fair" and allow each task to run for an assigned time slice.

Only for threads of the same priority. No lower priority thread will get any slice of CPU while a higher priority thread in the same process is runnable.
[Edit: That's not completely true, more information at the end. However, it is close enough to the truth to ensure that your higher priority network threads send and receive data as soon as possible.]

  1. Right after issuing the prefetch task, the main thread calls Sleep(0).

Calling Sleep() is a bad way to force threads to execute in a certain order. Set the thread priority according to the priority of the work they perform, and use OS primitives to block higher priority threads if they should not run.

I could possibly issue the prefetch thread at a higher priority than the main thread. That should cause the scheduler to run it right away, even if it must preempt the main thread. It may also have undesirable effects. It seems unnatural for a background worker thread to get a higher priority.

There is nothing unnatural about this. It is the intended way to use threads. You only must make sure that higher priority threads block sooner or later, and any thread that goes to the OS for I/O (file or network) does block. In the scheme I sketched above the high priority threads will also block on the queues.

I could possibly issue the query asynchronously.

I wouldn't go there. This technique may be necessary when you write a server for many simultaneous connections and a thread per connection is prohibitively expensive, but otherwise blocking network access in a threaded solution should work fine.

Edit:

Thanks to Jeroen Pluimers for the poke to look closer into this. As the information in the links he gave in his comment shows my statement

No lower priority thread will get any slice of CPU while a higher priority thread in the same process is runnable.

is not true. Lower priority threads that haven't been running for a long time get a random priority boost and will indeed sooner or later get a share of CPU, even though higher priority threads are runnable. For more information about this see in particular "Priority Inversion and Windows NT Scheduler".

To test this out I created a simple demo with Delphi:

type
  TForm1 = class(TForm)
    Label1: TLabel;
    Label2: TLabel;
    Label3: TLabel;
    Label4: TLabel;
    Label5: TLabel;
    Label6: TLabel;
    Timer1: TTimer;
    procedure FormCreate(Sender: TObject);
    procedure FormDestroy(Sender: TObject);
    procedure Timer1Timer(Sender: TObject);
  private
    fLoopCounters: array[0..5] of LongWord;
    fThreads: array[0..5] of TThread;
  end;

var
  Form1: TForm1;

implementation

{$R *.DFM}

// TTestThread

type
  TTestThread = class(TThread)
  private
    fLoopCounterPtr: PLongWord;
  protected
    procedure Execute; override;
  public
    constructor Create(ALowerPriority: boolean; ALoopCounterPtr: PLongWord);
  end;

constructor TTestThread.Create(ALowerPriority: boolean;
  ALoopCounterPtr: PLongWord);
begin
  inherited Create(True);
  if ALowerPriority then
    Priority := tpLower;
  fLoopCounterPtr := ALoopCounterPtr;
  Resume;
end;

procedure TTestThread.Execute;
begin
  while not Terminated do
    InterlockedIncrement(PInteger(fLoopCounterPtr)^);
end;

// TForm1

procedure TForm1.FormCreate(Sender: TObject);
var
  i: integer;
begin
  for i := Low(fThreads) to High(fThreads) do
//    fThreads[i] := TTestThread.Create(True, @fLoopCounters[i]);
    fThreads[i] := TTestThread.Create(i >= 4, @fLoopCounters[i]);
end;

procedure TForm1.FormDestroy(Sender: TObject);
var
  i: integer;
begin
  for i := Low(fThreads) to High(fThreads) do begin
    if fThreads[i] <> nil then
      fThreads[i].Terminate;
  end;
  for i := Low(fThreads) to High(fThreads) do
    fThreads[i].Free;
end;

procedure TForm1.Timer1Timer(Sender: TObject);
begin
  Label1.Caption := IntToStr(fLoopCounters[0]);
  Label2.Caption := IntToStr(fLoopCounters[1]);
  Label3.Caption := IntToStr(fLoopCounters[2]);
  Label4.Caption := IntToStr(fLoopCounters[3]);
  Label5.Caption := IntToStr(fLoopCounters[4]);
  Label6.Caption := IntToStr(fLoopCounters[5]);
end;

This creates 6 threads (on my 4 core machine), either all with lower priority, or 4 with normal and 2 with lower priority. In the first case all 6 threads run, but with wildly different shares of CPU time:

6 threads with lower priority

In the second case 4 threads run with roughly equal share of CPU time, but the other two threads get a little share of the CPU as well:

4 threads with normal, 2 threads with lower priority

But the share of CPU time is very very small, way below a percent of what the other threads receive.

And to get back to your question: A program using multiple threads with custom priority, coupled via producer-consumer queues, should be a viable solution. In the normal case the database threads will block most of the time, either on the network operations or on the queues. And the Windows scheduler will make sure that even a lower priority thread will not completely starve to death.

mghie
`No lower priority thread will get any slice of CPU while a higher priority thread in the same process is runnable.`; I'm not sure that is true, see: http://blogs.msdn.com/b/oldnewthing/archive/2005/10/03/476413.aspx +1 for the rest of your well thought answer.
Jeroen Pluimers
@Jeroen: A thread being runnable means it's not running currently. So a free core will only run the lower priority thread if all higher priority threads are either running or blocked. Should I edit my answer accordingly?
mghie
@mghie: Please do; please also research if that is indeed what is going to happen. I seem to recollect that low-priority threads can get some CPU time, even though there are high-priority threads running. There also is something like "dynamic priority" and "critical sections". The Windows Thread scheduler is a complex beast; it makes for a lot of interesting reading, for instance: http://stackoverflow.com/questions/656959/win32-thread-scheduling, http://support.microsoft.com/kb/96418 and http://msdn.microsoft.com/en-us/library/ms684831(VS.85).aspx
Jeroen Pluimers
@Jeroen: Thanks for the links, very interesting. Lower priority threads will indeed run, but with a very small percentage of CPU time. I have added the code of the test program I used, the OP will surely need to make his own tests if he chooses to go that route. Thanks again, there's always something interesting to learn.
mghie
@mghie: thanks for the compliment. Threading is difficult (even for seasoned programmers); it is one of the reason I quoted Danny Thorpe: https://wiert.wordpress.com/2010/10/21/twitter-danny-thorpe-new-programmers-are-drawn/
Jeroen Pluimers
You're correct about prioritizing the threads. I was stuck thinking of them as low-priority background threads. Instead, they should be like Windows DCPs -- higher priority and very short. What I was trying to do was to trick the OS into running the thread instead of simply telling it to.
Guy Gordon