Monitoring External Systems with Platform Agents
Licensing
Process-based Contracts
All monitoring is included in the license.
License-based Contracts
A UNIX or Microsoft Windows Platform Agent can be configured to be for monitoring only by not assigning any job Definition Types that run on Platform Agents and not assigning any file events. Such Process Servers do not consume any licenses. On OpenVMS, however, you must assign the DCL job Definition Type, so there is no free monitoring.
Prerequisites
- Monitoring must be enabled.
- A monitoring Process Server must be configured for monitoring.
Configuration
Process Server and Queue monitoring is disabled by default for performance reasons. To enable monitoring:
-
Set the
/configuration/jcs/monitoring/enabled
configuration entry totrue
. -
For each Process Server you want to monitor, set the
MonitorInterval
Process Server parameters.
Default Monitor Nodes
A Platform Agent will report CPU busy
, IO page rate
and disk capacity
by default. You can tune how often it does this by changing the MonitorInterval
Process Server parameter. The data is stored in the monitor tree at the following paths:
System.ProcessServer.${PSName}.Performance.Load
: By default, the number of processes the Process Server is currently processing, or a representation of the load factors as configured.System.ProcessServer.${PSName}.Performance.LoadThreshold
: By default, the maximum number of processes allowed to run simultaneously, or the maximum load specified on the Load Factors tab.System.ProcessServer.${PSName}.Performance.CPUCount
: The number of CPUs the system has.System.ProcessServer.${PSName}.Performance.CPUBusy
: The CPU usage on the server.System.ProcessServer.${PSName}.Performance.PageRate
: The amount of memory paging that is taking place.System.ProcessServer.${PSName}.Performance.NetworkResponseAverage
: The average communication overhead with the Platform Agent per transfer, in seconds.System.ProcessServer.${PSName}.Performance.NetworkResponseMaximum
: The maximum communication overhead with the Platform Agent per transfer, in seconds.System.ProcessServer.${PSName}.Performance.NetworkResponseMinimum
: The minimum communication overhead with the Platform Agent per transfer, in seconds.System.ProcessServer.${PSName}.Performance.NetworkTransferCount
: The number of transfers exchanged with the Platform Agent.System.ProcessServer.${PSName}.Performance.NetworkTransferRate
: The volume of network traffic sent and received by the Platform Agent, in bytes per second.System.ProcessServer.${PSName}.Performance.NetworkUptime
: The amount of time since the last network error or startup, in seconds.System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Free
: The free space on the file system.System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Used
: The used space on the file system.System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Total
: The total size of the file system.System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.UsedPercentage
: The percentage of used space on the file system.System.ProcessServer.${PSName}.Checks.${Check_Name}.${Monitored value}
: Custom Process Server checks.${PSName}
: The Process Server name. For example, System.${FileSystemPath}
: The path to the local file system. For example,C:\\
or/home
.{Check_Name}
: The name of the Process Server check (or its description, if it is set).{Monitored value}
: The name of the Process Server check that is performed. This depends on the type of check.
The Load and LoadThreshold are calculated for all Process Servers, not just for Process Servers that include the PlatformAgentService. The LoadFactors for a Process Server point to a MonitorCheck such as CPUBusy or PageRate. All load factors are added up into a particular load. If the summed load is higher than the maximum allowed by the Process Server's LoadThreshold attribute, the Process Server will go into status Overloaded. In addition to displaying this status, you can also create programmatic actions by defining a condition that checks the summed load and raises the appropriate events.
Note: The file system statistics are reported for all local disks. Network shares are not taken into account.
Network Statistics Logging
Network statistic logging is done at least every 24 hours, but usually every hour if there is anything to report. It takes the following form in the Platform Agent log files.
INFO 2023-07-27 16:34:48,663 CES common.statistics - The agent started 0 job processors in the last 359 minutes, with at most 0 in parallel
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 1 HTTP requests in the last 359 minutes, average 0.124s, max 0.124s, min 0.124s
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 1087 HTTP requests (scheduler) in the last 359 minutes, average 0.052s, max 0.204s, min 0.030s
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 19 file reads in the last 359 minutes, total 25024 bytes
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 173947 file writes in the last 359 minutes, total 24063781 bytes
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 8 network connections in the last 359 minutes, average 0.010s, max 0.029s, min 0.001s
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 12 network name lookups in the last 359 minutes, average 0.013s, max 0.126s, min 0.000s
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 7565 network reads in the last 359 minutes, total 890417 bytes
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 2948 network writes in the last 359 minutes, total 475673 bytes
The "network connections" statistics (average, max, min) are usually less than one second. In the above, the average response is 10 milliseconds, with a worst case of 29 milliseconds. Note that this includes both the pure network latency as well as the time the network takes to do data transfers. The latter factor is usually negligible, but be careful in cases where large files are sent over the network.
The "network name lookup" statistics show how the customer DNS service is performing.
HTTP requests not marked as HTTP requests (scheduler)
were requests where the request was either to a different HTTP service than the pure Platform Agent to server communication.
Note that no HTTP request failures happened in the above log. Such failures would show up like this:
INFO 2023-07-27 16:34:48,663 CES common.statistics - Performed 1 HTTP requests (failed) in the last 359 minutes, average 30.03s, max 30.03s, min 30.03s
Note that only failed HTTP requests are logged separately, not failed DNS requests.