Load Balancing
Load balancing lets you get the most out of your Job Servers by evenly distributing the load across systems. By default, RunMyJobs uses the Job count of Job Servers to distribute the load. However, this is not very efficient when different Jobs use resources differently, because resource-intensive Jobs count as much as lightweight processes. Fortunately, RunMyJobs offers several other load balancing strategies that can be used on their own or in combination.
Generic Load Balancing Using Queues
Generic Queue-based load balancing can be used for any type of workload. This approach is typically used to prevent a weak system from slowing down or offloading work onto a server that has a more important role in another Queue.
When you create a Queue or Queue Provider, you can specify an Execution Size that determines the number of concurrent Jobs allowed in the Queue. If the Queue serves several Job Servers, RunMyJobs will send Jobs to the Job Server with the smallest number of Jobs in status Running. You can influence this by setting an Execution Size on the Queue Provider associated with a Job Server. RunMyJobs will ignore the Job Server while the number of its running Jobs equals the Queue Provider's Execution Size.
Note: You can include Jobs in status Waiting in your Execution Size, but these typically do not consume any resources on the remote system.
Using Load Factors
You can use load factors to specify custom metrics for evaluating the load of a Job Server.
- Multiplier: The relative weight of a specific load factor
- Threshold: The maximum value allowed for this load factor, when this is reached, the Job Server is set to Overloaded
- Monitor Value: The unit to use (CPU time, Page rate, Job Server check value,
jmonitor
value) - Load Threshold: The maximum allowed value of the sum of all load factors (multiplier * Monitor Value). When this is reached, a Job Server goes into status Overloaded.
Note: Load Threshold is Job Server-specific, not load factor-specific. There is only one Load Threshold per Job Server. On Job Servers with one load factor only, Load Threshold should be set to the same or higher value as Threshold, because t is only used when you want to take the combined effects of multiple load factors into account.
Note: The Threshold and Load Threshold values must take the Multiplier values into account.
OS Metric Load Balancing with Monitor Values
There are several options for load-balancing OS processes across Platform Job Servers.
- Job count load balancing: The default option if no load factors are defined for the Job Servers. The system sends new Job to the Job Server with the smallest number of running Jobs and Workflows.
- OS metric load balancing: Uses near real-time monitoring data from the Platform Agents to decide where to run the Job or Workflow.
- jmonitor load balancing: Uses near real-time monitoring data generated by
jmonitor
.
Job Count
The default load balancing technique uses concurrent Jobs as the metric for balancing the load and is enabled by default.
The Load
and LoadThreshold
monitor values for the Job Server have the following values:
/System/Job Server/${PSName}/Performance/Load
: The number of Jobs the Job Server is currently processing./System/Job Server/${PSName}/Performance/LoadThreshold
: By default, the maximum number of Jobs allowed to run simultaneously.
OS Metric
This type of load balancing requires a Platform Agent on each server and is typically used for Platform Agent workload. This load balancing uses load factors and a threshold for each Job Server. Two commonly used load factors are CPU usage ( CPUBusy
) and page rate ( PageRate
, the rate at which pages are sent to and retrieved from the swap area), however, you can create Job Server checks to create your own criteria as well.
The Load
and LoadThreshold
monitor values for a Job Server have the following values:
/System/ProcessServer/${process_server}/Performance/Load
: Representation of the load factors as configured./System/ProcessServer/${process_server}/Performance/LoadThreshold
: Maximum load specified on the Job Server's Load Factors tab.
Example 1
Assume that two Job Servers accept Jobs that require a specific resource. Job Server prd5.example.com
(MSLN_UNIXS5) is more powerful than Job Server prd7.example.com
(MSLN_UNIXS7), so you want to run 1.5 times more Jobs on pr5.example.com
. Consequently, prd7.example.com
should run a maximum of 50
concurrent Jobs, and prd5.example.com
should be run a maximum of 75
concurrent Jobs.
To accomplish this, specify the following load factors on the Job Servers:
Job Server | Multiplier | Threshold | Monitor Value | Load Threshold |
---|---|---|---|---|
prd5.example.com | 2
|
150
|
/System/Job Server/MSLN_UNIXS5/Performance/Load
|
150
|
prd7.example.com | 3
|
150
|
/System/Job Server/MSLN_UNIXS7/Performance/Load
|
150
|
Example 2
Assume the same situation as in Example 1, except CPU utilization should not be allowed to go above 90%. Assume also that a Job uses a maximum of 5% CPU time.
Job Server | Multiplier | Threshold | Monitor Value | Load Threshold |
---|---|---|---|---|
prd5.example.com | 2
|
150
|
/System/ProcessServer/MSLN_UNIXS5/Performance/Load
|
235
|
prd5.example.com | 1
|
90
|
/System/ProcessServer/MSLN_UNIXS5/Performance/CPUBusy
|
235
|
prd7.example.com | 3
|
150
|
/System/ProcessServer/MSLN_UNIXS7/Performance/Load
|
235
|
prd7.example.com | 1
|
90
|
/System/ProcessServer/MSLN_UNIXS7/Performance/CPUBusy
|
235
|
Note: The Load Threshold will only be reached when the sum of Load
and CPUBusy
monitor node values for one Job Server reach 235
.
Example 3
Assume you have two Job Servers, of which one is used for other work besides RunMyJobs Jobs. Server prd1.example.com
is a powerful system that is used by multiple applications, compared to prd3.example.com
, which has been added to the pool to relieve prd1.example.com
. Assume you do not want to assign Jobs to Job Server prd1.example.com
when its CPU usage reaches 80%
and you want to have a ratio of 1.5:1
between the two.
To accomplish this, configure the following load factors:
Server | Multiplier | Threshold | MonitorValue | Load Threshold |
---|---|---|---|---|
prd1.example.com | 3
|
240
|
/System/ProcessServer/MSLN_UNIXS1/Performance/CPUBusy
|
|
prd3.example.com | 2
|
200
|
/System/ProcessServer/MSLN_UNIXS3/Performance/CPUBusy
|
|
This means that 1% CPU usage is worth 3 units on prd1.example.com
and only 2 units on prd3.example.com
. In theory, with Jobs using the same amount of resources, prd3.example.com
will peak sooner than prd1.example.com
. As soon as the CPU usage on pr1.example.com
reaches 80%
, no new Jobs will be dispatched to it.
jmonitor
You can use the jmonitor command line tool to store monitoring values and display them in the Monitor Nodes screen. Although you are free to use any path, it is highly recommended to store the values under /System/ProcessServer/${process_server}/Custom/
. You can create child nodes there to group specific values.