heuristics for task scheduling algorithms
I’ve been reading Oliver Sinnen’s book Task Scheduling For Parallel Systems. The idea of task scheduling is to make a program faster by running several statements simultaneously. Here’s a simple program:
x = foo()
y = bar()
foobar(x, y)
The computer can potentially run the first two lines at the same time and
then communicate to make x
and y
available to run the third line. This
is an example of a schedule, a mapping of expressions to processors. Task
scheduling generalizes this idea to arbitrary graphs of expressions. It’s
NP hard.
Types of Algorithms
Sinnen discusses three general heuristics for scheduling algorithms.
- List scheduling is a greedy algorithm that assigns expressions to the first ready worker.
- Clustering groups computations on one worker to avoid the expense of transferring data.
- Genetic algorithms start with random schedules and evolve them until they’re good enough.
Future Directions
While timing data analysis programs I’ve noticed that typically only a few expressions take significant time. This suggests a different kind of heuristic: Find an optimal schedule for the few long running expressions and schedule everything else around that. I’ll probably implement this in the R makeParallel package.