I’ve been reading Oliver Sinnen’s book Task Scheduling For Parallel Systems. The idea of task scheduling is to make a program faster by running several statements simultaneously. Here’s a simple program:

x = foo()
y = bar()
foobar(x, y)

The computer can potentially run the first two lines at the same time and then communicate to make x and y available to run the third line. This is an example of a schedule, a mapping of expressions to processors. Task scheduling generalizes this idea to arbitrary graphs of expressions. It’s NP hard.

Types of Algorithms

Sinnen discusses three general heuristics for scheduling algorithms.

  • List scheduling is a greedy algorithm that assigns expressions to the first ready worker.
  • Clustering groups computations on one worker to avoid the expense of transferring data.
  • Genetic algorithms start with random schedules and evolve them until they’re good enough.

Future Directions

While timing data analysis programs I’ve noticed that typically only a few expressions take significant time. This suggests a different kind of heuristic: Find an optimal schedule for the few long running expressions and schedule everything else around that. I’ll probably implement this in the R makeParallel package.