Creating Job Server Checks
A Job Server Check lets a Job Server with an attached Platform Agent service can monitor logs and Jobs on UNIX, Windows, or OpenVMS. You can add a Job Server Check to a Job Server on the Job Server's Checks tab, and then view the status of the Check on the Monitor Nodes screen.
The monitoring system has three general severity grades (green, yellow and red). You can use Severity and Condition Expression to create a default condition in the Monitor Nodes screen. Normally, a condition named Default will be created on the Monitor Check that is created as a result of the Job Server Check. This condition will set severity 50
(yellow) and Condition Expression = Count < 1
unless you set other values in the Job Server Check.
When you implement a Job Server Check, set levels and grades accordingly so that operators can immediately analyze the situation and react accordingly. You should create at least two Job Server Checks for everything you want to monitor: one to match green grades and one to match red grades. You can do this with the Severity and Condition Expression fields.
Note: Do not edit the Default condition, because the values in there will then be overwritten with those from the Job Server Check.
If you want to use more complicated conditions than the simple single condition allowed by the Severity and Condition Expression fields, you can add your own Conditions on the Monitor Check with a name other than Default. As soon as you create such a condition the Default condition will not be updated or recreated.
If a Check does not find the expected value, you can submit a reaction Job Definition and/or create an ad hoc Alert. (Reaction Job Definitions can attempt to fix the issue by, for example, restarting the affected database or application.)
Note: You can also use monitor Alert Sources mapped to the monitor created in the Monitor Nodes screen. This allows more control over when to create an alert.
The following table shows some example Job Server Checks.
OS Family | Style | Object Name | Attribute 2 | Explanation |
---|---|---|---|---|
UNIX | Process | ora_dbwr_ |
|
Process matching on UNIX is on the output of ps -ef, so wildcards are needed. |
VMS | Process | NETACP |
|
Process matching on VMS is purely on the process name, so no wildcards are needed. |
UNIX | Logfile | /var/log/system.log | dhcp: | Log messages written by the DHCP service. |
UNIX | Socket | 21 |
|
Check that the FTP service is running. |
Windows | Service | W32Time |
|
Check that the Windows Time Service is running (by its service name). |
Windows | Service | Windows Time |
|
Check that the Windows Time Service is running (by its display name). |
The fields you can add per Job Server Check are as follows.
Field | Description |
---|---|
Name |
The Name is used as an identifier to distinguish checks of the same Job Server in the log files. Names also determine the path of checks in the Monitor Nodes screen. Depending on the Style, the path will be like the following.
|
Enabled | Check this box to enable the Job Server Check. |
Style |
The type of Job Server Check.
|
Object Name |
Object Name is always required. What it determines depends on the selected Style. For the Process (UNIX, OpenVMS) and the Service (Windows) styles, Object Name can use glob matching.
|
Attribute 2 | The second attribute of the Job Server Check (required for the Logfile and EventLog styles). |
Poll interval |
The Poll Interval is used as the upper bound for how often the Job Server Check is performed. This is not a pure interval, because a Platform Agent may run multiple checks of the same style using a single pass, in which case the check may be performed more often than this. |
Severity |
The severity to be reported if the expression in the Condition Expression evaluates to The monitoring system has three general severity grades (green, yellow and red). These relate to the severity value as follows.
|
Condition Expression | An expression that describes a state. For example: =Count > 0 . |
Delay Amount | Number of Delay Units to wait before firing the ad hoc alert or submitting the Reaction Job Type Job. |
Delay Units | The delay units. |
Ad Hoc Alert Source | Ad hoc Alert Source to fire. |
Job Definition | The Job Definition to submit. |
Address | The address to be used for the ad hoc Alert Source or Parameter. |
Message | The message to be used for the ad hoc Alert Source or Parameter. |
Data | The data to be used for the ad hoc Alert Source or Parameter. |
Example
Assume you want to make sure that an Oracle database is running.
Name | Value |
---|---|
Description | Check Oracle running. |
Documentation | Check that Oracle is running. |
Style | Process |
Object Name | *ora*_orcl
|
Attribute 2 |
|
Poll interval | 3
|
Severity | 0
|
Condition Expression | =Count > 10
|
Add another check, so that the severity is set to high when less than two processes are running for Oracle.
Name | Value |
---|---|
Description | Check Oracle Not running. |
Documentation | Check that Oracle is not running. |
Style | Process |
Object Name | *ora*_orcl
|
Attribute 2 |
|
Poll interval | 3
|
Severity | 75
|
Condition Expression | =Count < 2
|
Here is a check to see if an Oracle Listener is working.
Name | Value |
---|---|
Description | Check Oracle Listener is running. |
Documentation | Check that Oracle Listener is running. |
Style | Socket |
Service | 1521
|
Poll interval | 5
|
Severity | 75
|
Condition Expression |
|
Attribute 2 is only used for some styles.
- It is not used for the Process and Service styles.
- For the Logfile and Eventlog styles, Attribute 2 can contain a glob pattern. The Logfile records are the lines in the file. The Windows Eventlog records are the complete message expanded using the locale defined for the Platform Agent.
- For the Socket style, Attribute 2 contains the network address that the socket should be bound to. The default is
0.0.0.0
(all IP addresses of the server).