Tkwatcher is a tcl based program for monitoring a system.
ANNOUNCING release 1.4 of of tkwatcher. (See the end for a Changelog.)
- Purpose
- Benefits over Watcher
- It can Monitor
- Types of tests it can perform
- Files List
- Installation
- CHANGELOG: Changes since 1.3:
- Obtaining tkwatcher and more info/examples
Purpose
Tkwatcher is a tcl program that allows monitoring and analysis of program output. It can use any tcl based shell including tclsh, expect, wish, or scotty. It runs with tcl-7.3 and tcl-8.0 and should work with future versions.
By default it reports problems by email, but it can also send problem reports to a file or to standard output. The reports can be human or program readable. In addition to these reporting modes, it can log error using external programs to permit syslogging, paging or other real-time notification of errors.
Benefits over Watcher
It was inspired by the program watcher by Kenneth Ingham (not Inghman as mistyped in the docs these many years. Hope to fix this typo in version 1.5, sorry Mr. Ingham), but adds features that I found lacking in the original watcher. Among those features are the ability to:
- select portions of the controlfile
- print command headers in the error messages
- select individual lines from a command output stream using absolute positions, or a regular expression
- perform and test calculations based on the input data
- specify multiple tests on a value that are anded together to determine if a warning should be issued.
- set thresholds for reports when all other tests are positive. This allows the user to set thresholds low enough for problems to be caught early, but excessive noise is eliminated by waiting until enough low threshold tests are passed.
It can Monitor
This tool has been used to:
- monitor disk space
- look for stalled jobs in the print queue
- look for run away processes chewing up cpu time
- monitor swap space changes
- look for problems with network interfaces:
- excessive collision rates
- bad rpc calls
- excessive bad xid's indicating that an nfs server is overloaded
- monitor for swapping and paging activity
- verify operation of the X windows font server
- monitoring ntp network for hosts with excessive deviance from established norms
- monitor for excessive ping round trip time
- verify that required daemons are running on the system
- look for excessive copies of certain programs running on the system
- monitor users who keep software licenses for more than a preset amount of time
- watch for changing ip to ethernet mappings in the arp cache indicating an arp attack.
It permits you to parse any command output line extracting values from the line, and performing tests on these values.
Types of tests it can perform
The list of tests includes:
- is the value in a give numeric range?
- is a required value present?
- is the value equal/not equal to X
- is the value equal/not equal to X or Y or Z or A ...
- has the value increased/decreased by a certain amount from the previous run?
- has the value increased/decreased by a certain percentage from the previous run?
- has the value changed or not changed since the last run
You can perform calculations on multiple values to apply tests to a calculated value. You can also write arbitrary tcl procedures that can be used for special purpose tests.
To give a bit better idea of what it can do, a sample entry is shown below:
set watch(6,disk) {"df -k" "disk space" { {"=1" {header 0-end %H}} {"^/mnt" {ignore 0 %s} } {"^/var" {filesystem 0 %k} {varcapacity 4 %d}} {"^/" {filesystem 0 %k} {capacity 4 %d}} } { {capacity delta_up 10 range 0 80} {capacity delta_up 30 range 0 60 severity alert} {capacity range 0 99 severity emerg orgroup a} {capacity range 0 98 delta_up 0 severity alert orgroup a} {capacity range 0 90 delta_up 0 orgroup a} {varcapacity range 0 75 change} {varcapacity delta_up 10 range 0 50 severity warning} } }df -k produces output that looks like:
Filesystem 1k-blocks Used Available Use% Mounted on /dev/hda5 1763113 1001124 670868 60% / /dev/hda6 763113 125344 637769 16% /var /dev/hda1 2044208 1244848 799360 61% /disk/hda1 /dev/hdc 634944 634944 0 100% /mnt/cdrom /dev/sda4 98078 23740 74338 24% /mnt/dos/zipThis checks for:
- capacity of the /var filesystem has changed and is above 75% full generates a warning
- capacity of the /var filesystem has increased by 10, and it is above 50% full generates a warning
- capacity increasing by more than 10% when the disk is more than 80% full
- capacity increasing by more than 30% when the disk is more than 60% full
- capacity at more than 99% gets a severity emerg warning
- capacity at more than 98% that has increased gets a severity alert warning
- capacity at more than 90% that is increasing gets a warning
Only one of the previous three will generate a report even if the disk has just gone up to 100% full.
- Capacity measurements for any removable disks mounted under the /mnt filesystem is ignored.
Files List
The distribution consists of:
- Readme - this file
- tkwatcher.1 - man page for tkwatcher
- Watcherfile - a sample tkwatcher controlfile
- tkwatcher - the tcl script
- ed.scr - an ed script to strip comments and extra debug stuff from tkwatcher
- tests - directory of various tests. Some examples may be useful.
Installation
Tkwatcher should run provided tclsh is in your path. Some external command such as mail, logger ... are used to report problems. If the defaults in the tkwatcher script don't work for you, change the defaults in the command array located near the top of the tkwatcher file. E.G. the default (which works under linux) command for reporting via syslog is:
set command(syslog) "/usr/bin/logger -t tkwatcher -i -p daemon."if its in /usr/ucb/logger, simply change the line to read:
set command(syslog) "/usr/ucb/logger -t tkwatcher -i -p daemon."Those interested in increasing its performance may want to hard code the path to the tcl interpreter in the file.
Other than that chmod +x the file (if needed) put it on your path, create a Watcherfile and you should be all set.
For security reasons, the location of the watcherfile and history file should not be writable to anybody except the user running tkwatcher.
To reduce tkwatcher's size (and thus increase its startup speed) and to remove debugging calls (also increasing its speed), execute the following commands:
grep -v "^#" ed.scr | ed tkwatcher
this will create the file tkwatcher.strip that can be used with the tracing subsystems, F for function trace, t for watcherfile trace, or e for viewing handled exceptions. The -v flag will work since it uses the "t" watcherfile tracing subsystem.
Tkwatcher is copyright 1995, 2000 John P. Rouillard (rouilj@ieeeSTOPSPAM.org) Tkwatcher may not be sold. Tkwatcher may be redistributed freely. There are no other restrictions on how this program may be used. There is no warranty with this program. It works for me, your mileage may vary.
Written by: John Rouillard (rouilj@ieeeSTOPSPAM.org) with thanks to
Mark Lamourine
CHANGELOG: Changes since 1.3:
Many cosmetic fixes, including code reorg to put configurable bits closer to the top of the file.
Tcl procedure interface now better defined. Multiple field arguments are now properly passed to tcl proc. Added an example tcl proc to Watcherfile and updated the manual page.
The history file name is now derived from the control filename. Previously the file ~/.watcher.hist was always used. A control filename that is > 9 characters will result in a history file name > 14 characters and may be a problem on some systems. But I consider those systems archaic and I really don't want to support them in the mainline code.
Added the ability to specify a regular expression for parsing data from a selected field. This is a more generic version of the prefix elimination function that was present in earlier and current versions of tkwatcher.
Added ability to designate fields counting from right to left so that you can select the rightmost space separated field. From the Ideas file in the original watcher.
Added ability to specify column ranges in reverse order to grab columns by counting from the end. E.G range of 39-32 would grab the data in the line that was 39 to 32 columns from the end. Note it doesn't reverse the data in the line, it merely specifies columns counting from the right.
If command to be parsed is not started properly (not found, can't exec etc) tkwatcher doesn't bail out with a tcl error. It reports the error in the output reporting mechanism.
Fixed a couple of tracing subsystem typos.
Fixed a bug that resulted in tkwatcher not reporting a "required" error when there were no matching parse lines.
Fixed bug where the a prefix elimination parse string "iddpr2-%d" caused the history file to save bogus information. This happened when the prefix string failed to parse the information properly. This happens a lot in column based parsing when a command rearranges the output columns as numbers increase. The historyfile data is now blanked out for the failed parse. Since tkwatcher reports the failed parse, the owner will be notified of the problem, so I don't consider loss of history info an issue.
Fixed a bug that prevented execution of other actions after a value, novalue, enum or noenum action.
Fixed a bug that made the action 'value "a value with spaces"' or novalue not work correctly. Enum and no enum was not affected by this bug.
Previous history value is displayed (if it exists) properly when a tcl proc is called.
A tcl proc can now stop evaluation of other actions by setting the variable stop_eval to 1.
Obtaining tkwatcher and more info/examples
TkWatcher can be obtained from
ftp://ftp.cerias.purdue.edu/pub/tools/unix/sysutils/tkwatcher/tkwatcher1_4.tar.gz
More info can be gotten from: the man page. Which also has annotated examples. The tarball has a full working Watcherfile example which is also available here. I have edited the Watcherfile to obscure/eliminate host names and IP addresses. Hopefully I did not break anything in the process.
-- John Rouillard