Load Balancing on Platform Agents
If a Jobs can run on more than one Job Server, you probably want to spread the load as evenly as possible. By default, RunMyJobs runs a new Job on whichever server has the lowest number of running processes.
You can optionally also use os metric load balancing, where the decision is no longer based on the number of running processes but on some metric. The metric that is chosen is user-defined. All metrics are first stored in the Monitor Tree data. This is a tree of nodes and values.
Built-in Platform Agent metrics
The Platform Agents are able to send CPU and memory usage data to the server. The frequency at which this data is sent is controlled with the MonitorInterval Job Server parameter. If you do not set this parameter, the default value is 60 (seconds). Set the value to zero if you do not plan to use load-balancing or use you plan to use values other than the system metrics.
Once you set MonitorInterval to a non-zero value, the data for load balancing is stored at the following locations.
/System/ProcessServer/<process_server>/Performance/CPUBusy
/System/ProcessServer/<process_server>/Performance/PageRate
Custom Metrics
The load balancing system can also use other metrics. These can be written to the monitoring tree using the Java API, or using the jmonitor tool within a (possibly long-running) process.
For example, if you want to base your decision on the number of free printers available at a particular system, you should call the following in your job(s) whenever the number of available printers changes:
jmonitor -j /System/ProcessServer/<process_server>/Custom/FreePrinters=$FREE
Setting the Load Factor
You can influence the computed load on a particular Job Server by adding one or more load factors to its definition. Each load factor has three attributes:
- Threshold: The maximum allowed value. If this is reached, the Job Server is put into Overloaded status until the threshold is no longer met.
- Multiplier: The multiplier used to compare the current value to other Job Servers.
- Monitor Value: The monitoring leaf value that is used.
The following fields are at the Job Server level.
- Load Threshold: The maximum allowed load, counting values from all load factors; once reached, the Job Server goes into Overloaded status until the load threshold is no longer met.
- Execution Size: The maximum number of concurrent processes the Job Server can run.
Once you have set the load factors, the system will choose a new process to run on the system that has the lowest value for the following equation:
sum(Multiplier * Monitor Value)
The result of the above equation is also used to determine if the Load Threshold has been reached.
If a Job Server has at least one load factor where the current monitor value exceeds the threshold value, or the sum of all monitor values exceeds the Load Threshold value, that Job Server goes into status Overloaded and is not chosen.
If two or more Job Servers have an identical load, the Job Server that was created first is used.
Note: If any of the eligible Job Servers has no load factors at all, OS metric load balancing is not used and the system reverts to counting the number of already executing Jobs . This means that for custom load factor-based load balancing to be applied to a Queue, all Job Servers serving the Queue must have at least one load factor. This allows you to control which Queues of a Job Server take load balancing into account.
Setting Up Load Balancing
To set up load balancing between two Platform Agents (represented by slow Job Server A and fast Job Server B), follow this procedure.
- Navigate to Configure > Control > Job Servers.
- Right-click Job Server A and choose Edit from the context menu.
- On the Load Factors tab, click
and enter
System load
into the Description field. - In the Multiplier field, enter
2
. - In the Threshold field, enter
90
. - In the Monitor Value field, choose the CPUBusy monitor for the path
/System/ProcessServer/<agent>/Performance
, where<agent>
is the Platform Agent that you are editing. - Click Save & Close.
- Right-click Job Server B and choose Edit from the context menu.
- On the Load Factors tab, click
and enter
System load
into the Description field. - In the Multiplier field, enter
1
. - In the Threshold field, enter
75
. - In the Monitor Value field, choose the CPUBusy monitor for the path
/System/ProcessServer/<agent>/Performance
, where<agent>
is the Platform Agent that you are editing. - Click Save & Close.
Example
Assume you want to balance the workload between two Platform Agents running on two different machines. The first machine has two slow CPUs, and the other one has four fast CPUs. You want to maximize throughput of the system. You can do this by using the faster server to good effect by allocating more Jobs to it, but you also need to reserve capacity on that machine for on-line users. To do so, you could implement the following settings:
Job Server A (with two slow CPUs) suggested load factor:
Field | Value |
---|---|
Multiplier | 1 |
Threshold | 100 |
Monitor Value | CPUBusy (for that Job Server) |
Job Server B (with four fast CPUs) suggested load factor:
Field | Value |
---|---|
Multiplier | 1 |
Threshold | 75 |
Monitor Value | CPUBusy (for that Job Server) |
The multiplier value can remain 1, because a process running on the faster system B will have less of an impact on CPU load than the same process will have on system A. Multipliers are most often used when you combine multiple load factors.
The threshold is set to 75 on Job Server B so that we reserve one CPU (one out of four = 25%) for non-batch work.