heuristics for task scheduling algorithms

I’ve been reading Oliver Sinnen’s book Task Scheduling For Parallel Systems. The idea of task scheduling is to make a program faster by running several statements simultaneously. Here’s a simple program:

x = foo()
y = bar()
foobar(x, y)

The computer can potentially run the first two lines at the same time and then communicate to make x and y available to run the third line. This is an example of a schedule, a mapping of expressions to processors. Task scheduling generalizes this idea to arbitrary graphs of expressions. It’s NP hard.

Types of Algorithms

Sinnen discusses three general heuristics for scheduling algorithms.

List scheduling is a greedy algorithm that assigns expressions to the first ready worker.
Clustering groups computations on one worker to avoid the expense of transferring data.
Genetic algorithms start with random schedules and evolve them until they’re good enough.

Future Directions

While timing data analysis programs I’ve noticed that typically only a few expressions take significant time. This suggests a different kind of heuristic: Find an optimal schedule for the few long running expressions and schedule everything else around that. I’ll probably implement this in the R makeParallel package.