Vacation! and Work-Stealing Schedulers

For the first time in years, I'm taking a vacation for something other than a holiday. It already feels great. Also, I decided to shave my head in preparations for summer.

I've been busy working on a new prototype that will eventually become the future of libgtask. Currently, it's prototype name is iris. I haven't decided if I'll continue using the name gtask since many of the new data structures would cause obvious collisions within GLib.

At the core of iris is message passing and basic concurrent constructs. There many reusable lock-free data structures such as:

These types of data structures are always interesting when you don't have a garbage collector. Typically, you end up using free-lists to prevent dereferencing freed memory in the case of a poorly timed pre-emption.

However, where things get really fun are when you run into the ABA problem. You end up doing fun things like using the lower 2 bits of properly aligned pointers to store version information. You can see my hack'ish attempt at that with gstamppointer.

I just started working on this a few weeks ago in my time away from work and it's been incredibly fun. The work-stealing scheduler landed today and it's quite fast for my very few test cases. It performs especially well in recursive work-loads, which happens when work-items yield new work-items. This use-case tries to hit the fast path for the work-stealing queue which pushes the work item onto the local threads queue. The placement is also important so the active thread can try to process it while the item is still hot in the cpu's cache.

There are currently three scheduler implementations. The default, IrisScheduler, uses a GAsyncQueue at its heart for dispatching. I know what you're thinking, and you're right, it has a lot of contention between threads. In addition is a lock-free scheduler, IrisLFScheduler, and the aforementioned IrisWSScheduler. The lock-free is not ideal, but it does have a specific task set its good for (lots of work-items created from outside of a scheduler thread and no regard for power consumption).

So lets quickly look at a simple use-case for the recursive work-loads and compare schedulers. The test consists of 1,000 messages being created, dispatched, scheduled, and executed. However, each message of these 1,000 generates in turn 1,000 more messages to be created, dispatched, scheduled, and executed. So 1 million in all.

Here is the result of the test with the default scheduler. Not fast enough if you ask me. My goal is millions of messages per second per core. (Well to be more specific, its millions of callbacks per second per core, but it all starts from a message, so yeah.)

chergert@chergert-desktop:~/Projects/iris/examples$ time ./recursive ** (process:6911): DEBUG: Done pushing items ** (process:6911): DEBUG: Waiting for items to complete ** (process:6911): DEBUG: Signal received, all done real 0m3.062s user 0m2.664s sys 0m3.176s

So now we will change just one line of our code to use a different scheduler, the work-stealing scheduler.

//scheduler = iris_scheduler_new ();
scheduler = iris_wsscheduler_new ();

And again, run our use-case.

chergert@chergert-desktop:~/Projects/iris/examples$ time ./recursive ** (process:7491): DEBUG: Done pushing items ** (process:7491): DEBUG: Waiting for items to complete ** (process:7491): DEBUG: Signal received, all done real 0m0.336s user 0m1.180s sys 0m0.048s

Ah, the beauty of a multi-core box that actually uses its cores in a nice, evenly manner. So we went from 3.062 seconds to 0.336 seconds. A 9.1x speedup, not bad for a single line of code!

For those that are curious, the numbers for this use-case are on a quad-core Intel Q6600 @ 2.40GHz. I've run the test on a low-voltage dual-core, with similar results if you take into account there are only 2 cores instead of four. The 8-core box I tested with had similar results to the quad-core, but just a hair slower. That could be from either a bit more contention for me to solve, or the fact that its a slower clock-speed.

So if you had any doubts, there you go. A bit of proof on how killer lock-contention can be. And if you are wondering what it is I'm going to be writing that requires all this, you'll have to wait a bit :-)

-- Christian Hergert 2009-04-14

Back to Index