Grid scheduling is a process of mapping Grid jobs to resources over multiple administrative domains.
A Grid job can be split into many small tasks.
The scheduler has the responsibility of selecting resources and scheduling jobs in such a way that the user and application requirements are met,in terms of overall execution time (throughput) and cost of the resources utilized.
3. Introduction
Grid scheduling is a process of mapping Grid jobs to
resources over multiple administrative domains.
A Grid job can be split into many small tasks.
The scheduler has the responsibility of selecting
resources and scheduling jobs in such a way that the
user and application requirements are met, in terms of
overall execution time (throughput) and cost of the
resources utilized.
4. Introduction
Jobs, via Globus, can be submitted to systems managed
by Condor, the Sun Grid Engine (SGE), thr Portable Batch
System (PBS) and the Load Sharing Facility (LSF)
6. Centralized Scheduling
In a centralized scheduling environment, a central
machine (node) acts as a resource manager to
schedule jobs to all the surrounding nodes that are
part of the environment.
This scheduling paradigm is often used in situations
like a computing centre where resources have
similar characteristics and usage policies.
7. Centralized Scheduling
Here, jobs are first submitted to the central scheduler, which then
dispatches the jobs to the appropriate nodes. Those jobs that
cannot be started on a node are normally stored in a central job
queue for a later start.
8. Centralized Scheduling: Advantage & Disadvantage
Centralized scheduling system may produce better scheduling
decisions because it has all necessary, and up-to-date,
information about the available resources.
Centralized scheduling does not scale well with the increasing
size of the environment that it manages.
The scheduler itself may well become a bottleneck, and if
there is a problem with the hardware or software of the
scheduler’s server, i.e. a failure,
it presents a single point of failure in the environment.
9. Distributed Scheduling
No central scheduler responsible for managing all the
jobs.
It involves multiple localized schedulers, which interact
with each other in order to dispatch jobs to the
participating nodes.
There are two mechanisms for
communicate with other schedulers
Direct Communication
Indirect Communication.
a
scheduler to
10. Distributed Scheduling: Direct Communication
Each local scheduler can directly communicate with
other schedulers for job dispatching.
11. Distributed Scheduling: Direct Communication
Each scheduler has a list of remote schedulers that they can
interact with, or there may exist a central directory that
maintains all the information related to each scheduler.
If a job cannot be dispatched to its local resources, its
scheduler will communicate with other remote schedulers
to find resources appropriate and available for executing its
job.
Each scheduler may maintain a local job queue(s) for job
management.
12. Distributed Scheduling: Indirect Communication
Communication via a central job pool
In this scenario, jobs that cannot be executed immediately
are sent to a central job pool.
13. Distributed Scheduling: Indirect Communication
Communication via a central job pool
Compared with direct communication, the local
schedulers can potentially choose suitable jobs to
schedule on their resources.
Policies are required so that all the jobs in the pool are
executed at some time.
This method can be modified, so that all jobs are
pushed directly in the job-pool after submission.
This way all small jobs requiring few resources can
be used for utilizing free resources on all
machines.
14. Hierarchical scheduling
In hierarchical scheduling, a centralized scheduler interacts with
local schedulers for job submission. The centralized scheduler is a
kind of a meta-scheduler that dispatches submitted jobs to local
schedulers.
15. Hierarchical scheduling
Similar to the centralized scheduling paradigm,
hierarchical scheduling can have scalability and
communication bottlenecks.
However, compared with centralized scheduling,
one advantage of hierarchical scheduling is that
the global scheduler and local scheduler can have
different policies in scheduling jobs.
16. HOW SCHEDULING WORKS
Grid scheduling involves four main stages:
resource discovery,
resource selection,
schedule generation and
job execution
17. Resource discovery
Goal: identify a list of authenticated resources that are
available for job submission.
In order to cope with the dynamic nature of the Grid,
a scheduler needs to have some way of incorporating
dynamic state information about the available
resources into its decision-making process.
A Grid environment typically uses
a pull model,
a push model or
a push–pull model
for resource discovery.
18. Resource discovery : The pull model
A single daemon associated with the scheduler can query
Grid resources and collect state information such as CPU
loads or the available memory.
19. Resource discovery : The pull model
The pull model for gathering resource information
incurs relatively small communication overhead,
but unless it requests resource information
frequently, it tends to provide fairly stale
information which is likely to be constantly out-ofdate, and potentially misleading.
In
centralized
scheduling,
the
resource
discovery/query process could be rather intrusive
and begin to take significant amounts of time as
the environment being monitored gets larger and
larger.
21. Resource discovery
Each resource in the environment has a daemon
for gathering local state information,
which will be sent to a centralized scheduler that
maintains a database to record each resource’s
activity.
If the updates are frequent, an accurate view of
the system state can be maintained over time;
obviously, frequent updates to the database are
intrusive and consume network bandwidth.
22. Resource discovery : The push–pull model
The push–pull model lies somewhere between the pull model and
the push model.
23. Resource discovery : The push–pull model
Each resource in the environment runs a daemon
that collects state information.
Instead of directly sending this information to a
central scheduler, there exist some intermediate
nodes running daemons that aggregate state
information from different sub-resources that
respond to queries from the scheduler.
A challenge of this model is to find out what
information is most useful, how often it should be
collected and how long this information should be
kept around.
24. Resource Selection
The second phase of the scheduling process :
Select those resources that best suit the constraints
and conditions imposed by the user, such as CPU
usage, RAM available or disk storage.
The result of resource selection is to identify a
resource list Rselected in which all resources can meet
the minimum requirements for a submitted job or a
job list.
The relationship between resources available
Ravailable and resources selected Rselected is:
Rselected ⊆ Ravailable
26. Resource Generation : Job Selection
The resource selection process is used to choose
resource(s) from the resource list Rselected for a given
job.
Since all resources in the list Rselected could meet the
minimum requirements imposed by the job, an
algorithm is needed to choose the best resource(s) to
execute the job.
Although random selection is a choice, it is not an
ideal resource selection policy.
The resource selection algorithm should take into
account the current state of resources and choose the
best one based on a quantitative evaluation.
27. Resource Generation : Job Selection
A resource selection algorithm that only takes CPU and RAM into
account could be designed as follows:
where :
WCPU – the weight allocated to
CPU speed;
CPUload – the current CPU load;
CPUspeed – real CPU speed;
CPUmin – minimum CPU speed;
WRAM – the weight allocated to
RAM;
RAMusage – the current RAM
usage;
RAMsize – original RAM size; and
RAMmin – minimum RAM size.
28. Resource Generation : Job Selection
Example: Suppose that the total weighting used in the
algorithm is 10, where the CPU weight is 6 and the RAM weight
is 4. The minimum CPU speed is 1 GHz and minimum RAM size
is 256 MB. Resource information matrix is as follow:
Find the best resource for submitted job.
29. Resource Generation : Job Selection
Then, evaluation values for resources can be
calculated using the three formulas:
From the results we know Resource3 is the best choice
for the submitted job.
30. Resource Generation : Resource Selection
The goal of job selection is to select a job from a
job queue for execution. Four strategies that can
be used to select a job are given below.
First come first serve
Random Selection
Priority-based Selection
Backfilling Selection
31. Resource Generation : Resource Selection
First come first serve:
The scheduler selects jobs for execution in the order of
their submissions.
If there is no resource available for the selected job, the
scheduler will wait until the job can be started.
The other jobs in the job queue have to wait.
There are two main drawbacks with this type of job selection.
1. It may waste resources when, for example, the job
selected needs more resources to be available before
it can start, which results in a long waiting time.
2. jobs with high priorities cannot get dispatched
immediately if a job with a low priority needs more
time to complete.
32. Resource Generation : Resource Selection
Random selection:
The next job to be scheduled is randomly
selected from the job queue.
Apart from the two drawbacks with the firstcome-first-serve strategy, jobs selection is not
fair and job submitted earlier may not be
scheduled until much later.
33. Resource Generation : Resource Selection
Priority-based selection:
Jobs submitted to the scheduler have different
priorities.
The next job to be scheduled is the job with the
highest priority in the job queue.
A job priority can be set when the job is submitted.
One drawback of this strategy is that it is hard to set
an optimal criterion for a job priority.
A job with the highest priority may need more
resources than available and may also result in a long
waiting time and inability to make good use of the
available resources.
34. Resource Generation : Resource Selection
Backfilling selection:
The backfilling strategy requires knowledge of
the expected execution time of a job to be
scheduled.
If the next job in the job queue cannot be
started due to a lack of available resources,
backfilling tries to find another job in the queue
that can use the idle resources.
35. Job execution
Once a job and a resource are selected, the next
step is to submit the job to the resource for
execution.
Job execution may be as easy as running a single
command or as complicated as running a series
of scripts that may, or may not, include set up or
staging.