Configuring Load Balancing on Platform Agents

If your processes can run on more than one Process Server, you probably want to spread the load as evenly as possible. By default, RunMyJobs runs a new process on whichever server has the lowest number of running processes.

You can optionally also use os metric load balancing, where the decision is no longer based on the number of running processes but on some metric. The metric that is chosen is user-defined. All metrics are first stored in the Monitor Tree data. This is a tree of nodes and values.

Built-in Platform Agent metrics

The Platform Agents are able to send CPU and memory usage data to the server. The frequency at which this data is sent is controlled with the MonitorInterval Process Server parameter. If you do not set this parameter, the default value is 60 (seconds). Set the value to zero if you do not plan to use load-balancing or use you plan to use values other than the system metrics.

Once you set MonitorInterval to a non-zero value, the data for load balancing is stored at the following locations.

Copy
/System/ProcessServer/<process_server>/Performance/CPUBusy
/System/ProcessServer/<process_server>/Performance/PageRate

Custom Metrics

The load balancing system can also use other metrics. These can be written to the monitoring tree using the Java API, or using the jmonitor tool within a (possibly long-running) process.

For example, if you want to base your decision on the number of free printers available at a particular system, you should call the following in your job(s) whenever the number of available printers changes:

jmonitor -j /System/ProcessServer/<process_server>/Custom/FreePrinters=$FREE

Setting the Load Factor

You can influence the computed load on a particular Process Server by adding one or more load factors to its definition. Each load factor has three attributes:

  • Threshold: The maximum allowed value. If this is reached, the Process Server is put into Overloaded status until the threshold is no longer met.
  • Multiplier: The multiplier used to compare the current value to other Process Servers.
  • Monitor Value: The monitoring leaf value that is used.

The following fields are at the Process Server level.

  • Load Threshold: The maximum allowed load, counting values from all load factors; once reached, the Process Server is put to Overloaded until the load threshold is no longer met.
  • Execution Size: The maximum number of concurrent processes the Process Server can run

Once you have set the load factors the system will choose a new process to run on the system that has the lowest value for the following equation:

sum(Multiplier * Monitor Value)

The result of the above equation is also used to determine if the Load Threshold has been reached.

If a Process Server has at least one load factor where the current monitor value exceeds the threshold value, or the sum of all monitor values exceeds the Load Threshold value, that Process Server goes into status Overloaded and is not chosen.

If two or more Process Servers have an identical load, the Process Server that was created first is used.

Note: If any of the eligible Process Servers has no load factors at all, OS metric load balancing is not used and the system reverts to counting the number of already executing processes. This means that for custom load factor-based load balancing to be applied to a Queue, all Process Servers serving the Queue must have at least one load factor. This allows you to control which Queues of a Process Server take load balancing into account.

Setting Up Load Balancing

To set up load balancing between two Platform Agents (slow Process Server A and fast Process Server B), follow this procedure.

  1. Navigate to Environment > Process Servers.
  2. Right-click Process Server A and choose Edit from the context menu.
  3. On the Load Factors tab, click and enter System load into the Description field.
  4. In the Multiplier field, enter 2.
  5. In the Threshold field, enter 90.
  6. In the Monitor Value field, choose the CPUBusy monitor for the path /System/ProcessServer/<agent>/Performance, where <agent> is the Platform Agent that you are editing.
  7. Click Save & Close.
  8. Right-click Process Server B and choose Edit from the context menu.
  9. On the Load Factors tab, click and enter System load into the Description field.
  10. In the Multiplier field, enter 1.
  11. In the Threshold field, enter 75.
  12. In the Monitor Value field, choose the CPUBusy monitor for the path /System/ProcessServer/<agent>/Performance, where <agent> is the Platform Agent that you are editing.
  13. Click Save & Close.

Example

Assume you want to balance the workload between two Platform Agents running on two different machines. The first machine has two slow CPUs, and the other one has four fast CPUs. You want to maximize throughput of the system. You can do this by using the bigger server to good effect by allocating more processes to it, but you also need to reserve capacity on that machine for on-line users. To do so, you could implement the following settings:

Process Server A (with two slow CPUs) suggested load factor:

Field Value
Multiplier 1
Threshold 100
Monitor Value CPUBusy (for that Process Server)

Process Server B (with four fast CPUs) suggested load factor:

Field Value
Multiplier 1
Threshold 75
Monitor Value CPUBusy (for that Process Server)

The multiplier value can remain 1, because a process running on the faster system B will have less of an impact on CPU load than the same process will have on system A. Multipliers are most often used when you combine multiple load factors.

The threshold is set to 75 on Process Server B, so that we reserve one CPU (one out of four = 25%) for non-batch work.