Scheduler + Supervisor/worker design pattern

Parallel and Concurrent Programming

Moderator: diblas

Scheduler + Supervisor/worker design pattern

Postby bfouts » 04/14/2009 01:44 pm

Objectives:
  • Encapsulation: Should be general purpose enough to be useful for a broad range of problems (even enough to be useful for problems fundamentally different in nature when compared to game engines)
  • Scalability: Although there are practical limits to the number of CPUs available on hardware, the module should not expect a particular number of CPUs or have some implicit upper/lower bound (other than of course, 1). As long as there are jobs that can be run given the associated conditions, no processor should starve due to poor disbatching algorithms/heuristics
  • Platform independence: This one is a bit tricky, but essentially I would like to be able to isolate API calls to the operating system so that this would be useful for the PS3, upcoming multicore iPhones, PC, Mac, Unix, etc. although we are likely to pick one or 2 and follow through with it for the sake of this project
  • Tasks are distributed and executed in such a way that optimizes the deadlines indicated by the jobs on the list (jobs should not complete late, and if such an event is unavoidable, high priority tasks and short tasks should be executed first)
  • Tasks should be dispatched in a way that optimizes cache hits (repeat tasks prefer to run on a core whose last completed task is determined to be related)
  • Synchronization is a huge issue, especially with games. Each job should be able to wait for certain non-time related conditions to be met before executing (including completion of of other specific jobs). This should be achievable through good associated conditions - memory locks should NOT be necessitated in most cases even though memory is shared.
  • System should be able to re-prioritize and re-distribute tasks as execution time estimates are exceeded - no subsystem should come to a halt due to an infinite loop in a particular function.
  • System should be able to throw away untaken tasks that are no longer relevant (conditions that can never be met, jobs that cannot be executed on time). This should be communicated to whatever is spawning the tasks so that a new task can be created to pick up the slack.

Considerations:

  • Some console architecture does not immediately support STL, and in the case of portable devices, the memory overhead of STL can be too much of a burden. Some template data structures may need to be created to keep it as platform independent as possible
    Although STL is a convenient implementation of a lot of commonly used items, many of the container classes are designed to be efficient in time in the asymptotic sense. For example, the STL set class stores the elements of the set in ascending order, which allows a logarithmic-time binary search to determine existence of an element. The ordering also supports logarithmic-time insertion and deletion. Unfortunately, real-time engines sometimes have performance penalties because of the overhead of STL, both in time and memory.
  • As with any system that has a core dependency on estimations that cannot always be determined before run time
  • In order to effectively ensure that all jobs are completed on time, each job should have an estimate of how long it will take to estimate. While user input can be a good head start, estimated run time should be LEARNED empirically, noting average, median, and range for prior executions. This is important for LOAD BALANCING.


Overview of Architecture:
  • There will be a single resource that contains a list of jobs that have not yet been assigned a core to execute on.
  • Each core will have a queue of tasks that it intends to execute (roughly) in order that are pulled from the global job list
  • The supervisor should be responsible for maintaining both the global task list and the individual task lists for the different cores (moving jobs around and re prioritizing as needed)

Class Diagrams:

Supervisor:
int numcores
queue<Job*> global_job_list

Job:
int preferred_core
int priority
time deadline
int claimed_by_core
time start_time
(bool) function* start_condition
function* execute_function
list<params> execute_params
int unique_id
function* deletion_condition


Core:
Job* CurrentJob
queue <Job*> joblist

Job Spawner??
(should be able to learn how long jobs take so that they can be prioritized more effectively
bfouts
User
 
Posts: 20
Joined: 10/01/2007 02:56 pm

Return to CMPE 113: Parallel and Concurrent Programming

Who is online

Users browsing this forum: No registered users and 1 guest