Creating Job Server Checks

A Job Server Check lets a Job Server with an attached Platform Agent service can monitor logs and Jobs on UNIX, Windows, or OpenVMS. You can add a Job Server Check to a Job Server on the Job Server's Checks tab, and then view the status of the Check on the Monitor Nodes screen.

The monitoring system has three general severity grades (green, yellow and red). You can use Severity and Condition Expression to create a default condition in the Monitor Nodes screen. Normally, a condition named Default will be created on the Monitor Check that is created as a result of the Job Server Check. This condition will set severity 50 (yellow) and Condition Expression = Count < 1 unless you set other values in the Job Server Check.

When you implement a Job Server Check, set levels and grades accordingly so that operators can immediately analyze the situation and react accordingly. You should create at least two Job Server Checks for everything you want to monitor: one to match green grades and one to match red grades. You can do this with the Severity and Condition Expression fields.

Note: Do not edit the Default condition, because the values in there will then be overwritten with those from the Job Server Check.

If you want to use more complicated conditions than the simple single condition allowed by the Severity and Condition Expression fields, you can add your own Conditions on the Monitor Check with a name other than Default. As soon as you create such a condition the Default condition will not be updated or recreated.

If a Check does not find the expected value, you can submit a reaction Job Definition and/or create an ad hoc Alert. (Reaction Job Definitions can attempt to fix the issue by, for example, restarting the affected database or application.)

Note: You can also use monitor Alert Sources mapped to the monitor created in the Monitor Nodes screen. This allows more control over when to create an alert.

The following table shows some example Job Server Checks.

OS Family Style Object Name Attribute 2 Explanation
UNIX Process ora_dbwr_

Process matching on UNIX is on the output of ps -ef, so wildcards are needed.
VMS Process NETACP

Process matching on VMS is purely on the process name, so no wildcards are needed.
UNIX Logfile /var/log/system.log dhcp: Log messages written by the DHCP service.
UNIX Socket 21

Check that the FTP service is running.
Windows Service W32Time

Check that the Windows Time Service is running (by its service name).
Windows Service Windows Time

Check that the Windows Time Service is running (by its display name).

The fields you can add per Job Server Check are as follows.

Field Description
Name

The Name is used as an identifier to distinguish checks of the same Job Server in the log files. Names also determine the path of checks in the Monitor Nodes screen. Depending on the Style, the path will be like the following.

  • System.ProcessServer.${PSName}.Check.$|CheckName}.Count
  • System.ProcessServer.${PSName}.Check.${CheckName}.Message
Enabled Check this box to enable the Job Server Check.
Style

The type of Job Server Check.

  • Eventlog: Windows only.
  • Logfile: UNIX, OpenVMS and Windows.
  • Process: UNIX and OpenVMS.
  • Service: Windows only.
  • Socket: UNIX, OpenVMS and Windows.
Object Name

Object Name is always required. What it determines depends on the selected Style. For the Process (UNIX, OpenVMS) and the Service (Windows) styles, Object Name can use glob matching.

  • On OpenVMS, the matching value is the Job name. On UNIX, the matching value is the output of a line of ps -ef or its equivalent. For Windows services, the matching value is Displayname (Servicename), which means that you can check on both names of the Service.
  • For Logfile, Object Name contains the filename of the log file that is to be checked.
  • For Eventlog (Windows), Object Name contains the name of the log. Typical values are System and Application, but other Windows logs are allowed.
  • For Socket, Object Name is the service port to be checked. You can specify a port number as a decimal number or a reference to be resolved by the Platform Agent on the target system.
Attribute 2 The second attribute of the Job Server Check (required for the Logfile and EventLog styles).
Poll interval

The Poll Interval is used as the upper bound for how often the Job Server Check is performed. This is not a pure interval, because a Platform Agent may run multiple checks of the same style using a single pass, in which case the check may be performed more often than this.

Severity

The severity to be reported if the expression in the Condition Expression evaluates to true.

The monitoring system has three general severity grades (green, yellow and red). These relate to the severity value as follows.

  • -1: Disabled.

  • 0: Everything is as it should be.

  • 1 -49: Green.

  • 50 - 74: Yellow (warning).

  • 75 - 100: Red (critical problem).

Condition Expression An expression that describes a state. For example: =Count > 0.
Delay Amount Number of Delay Units to wait before firing the ad hoc alert or submitting the Reaction Job Type Job.
Delay Units The delay units.
Ad Hoc Alert Source Ad hoc Alert Source to fire.
Job Definition The Job Definition to submit.
Address The address to be used for the ad hoc Alert Source or Parameter.
Message The message to be used for the ad hoc Alert Source or Parameter.
Data The data to be used for the ad hoc Alert Source or Parameter.

Example

Assume you want to make sure that an Oracle database is running.

Name Value
Description Check Oracle running.
Documentation Check that Oracle is running.
Style Process
Object Name *ora*_orcl
Attribute 2

Poll interval 3
Severity 0
Condition Expression =Count > 10

Add another check, so that the severity is set to high when less than two processes are running for Oracle.

Name Value
Description Check Oracle Not running.
Documentation Check that Oracle is not running.
Style Process
Object Name *ora*_orcl
Attribute 2

Poll interval 3
Severity 75
Condition Expression =Count < 2

Here is a check to see if an Oracle Listener is working.

Name Value
Description Check Oracle Listener is running.
Documentation Check that Oracle Listener is running.
Style Socket
Service 1521
Poll interval 5
Severity 75
Condition Expression

Attribute 2 is only used for some styles.

  • It is not used for the Process and Service styles.
  • For the Logfile and Eventlog styles, Attribute 2 can contain a glob pattern. The Logfile records are the lines in the file. The Windows Eventlog records are the complete message expanded using the locale defined for the Platform Agent.
  • For the Socket style, Attribute 2 contains the network address that the socket should be bound to. The default is 0.0.0.0 (all IP addresses of the server).

See Also