-
My recent and current projects.
- Project proposals for the CS682 Graduate Software Laboratory Engineering class
- Talk on "Laying waste to waste (LEAN Six Sigma for IT)" at BBLISA June 2014.
- Issue tracker for system administrators based on Roundup
- The DACS/Config method for managing changes to computers.
- SEC: The simple event correlator - paper on using it for real-time log analysis
- Nagios system monitoring software.
- tkwatcher
- My Prior projects.
- A little bit about me.
Recent and Current Projects
These projects are the ones people seem to have the most interest in or are being actively used.Laying waste to waste
This is a talk I gave to Back Bay LISA about process improvement and reducing waste in IT.
Waste is part of every process. As with entropy, waste increases unless steps are taken to reduce it.
This covers the basic ideas and some background behind Lean and 6 Sigma (LSS), discuss their origins and then walk through some examples from IT while introducing the tools and techniques.
For further details on the talk, see Back Bay LISA website for the announcement for Jun 11, 2014.
Also available are the slide deck (in odp (Libre/Open Office Impress) format) (it is also available in pdf format). A recording of the presentation can be seen on youtube.
A System Administration Template for the Roundup Issue Tracker
This is a tracker for the the Roundup Issue Tracker. This tracker requires version 1.6.0 or newer of Roundup. You can:- Try out the demo tracker
- Browse source code repository (download tar of newest development version) (hosted using Fossil SCM)
- Read current documentation
If you are not familiar with Roundup, it is a simple-to-use and install issue-tracking system with command-line, web, REST, XMLRPC, and e-mail interfaces. It is based on the winning design from Ka-Ping Yee in the Software Carpentry "Track" design competition.
Roundup's homepage is at https://roundup-tracker.org/. Roundup has been deployed for:
- bug tracking and TODO list management (the classic installation)
- customer help desk support (with a wizard for the phone answerers, linking to networking, system and development issue trackers)
- issue management for IETF working groups
- sales lead tracking
- conference paper submission and double-blind referee management
- weblogging (well, almost :)
Sysadmin tracker intro
The sysadmin tracker for Roundup is designed to handle issues generated in a system administration or help desk environment.
It has more features than the classic tracker. Some of the functions were inspired by equivalent functionality in Request Tracker (RT) while some other things I have wanted in various trackers (req, reqng, queuemh, RT1, Clearquest, Remedy, Jira, Cherwell) through the years.
Features
Compared to the classic tracker, this tracker has:
- Relationships between issues including:
- dependson - shows dependencies and prevents issues from being resolved if they depend on any unresolved issues.
- grouping (merging) - used to update multiple similar issues as a group.
- seealso - used to link issues that are related in some way.
- Two notification (nosy) lists for issues that models Request
Tracker's model:
- Watcher list (aka nosy) - This list receives replies to the issue and is intended for non-implementation level conversations with the requester. It is also used to notify the requester of milestone achievements.
- Technical List (aka verynosy) - This list is meant for technical discussion of how to implement solutions, and coordination of implementation as well as detailed implementation notes.
- Extra attributes on an issue for:
- scheduling:
- startdate - date on which work should start
- leadtime - amount of time the work should take
- duedate - date on which work needs to be done
- workingorder - set a numeric order to issue. Useful to determine which issue gets worked on first when you have multiple issues at the same priority. Must be an integer > 0. The system doesn't care if you use 1000 for the first item to be worked, or 1 for the first item. That is up to local policy.
- summary info:
- fyi - string used to keep critical info (e.g. server serial numbers, external call ticket numbers) quickly available without having to search through all the messages on an issue.
- automatic actions (e.g. close after three days if the customer
doesn't respond):
- actiondate - date on which action will occur
- actionstatus - the [optional] new status for the issue after the action triggers.
- actioncomment - the [optional] comment message that will be added to the issue when the action triggers.
- scheduling:
- the ability to assign issues to queues. Each queue can have its own list of email address for announcing a new unassigned item in the queue.
- time tracking capability
- a replyto link for each message that will reload the issue with the change note text area filled with a quoted attributed version of the message you are replying to.
- the ability to configure allowed status changes. E.G. you can prevent somebody from changing a ticket from new to closed. New could transition to only open, stalled or hold.
- preloaded searches for displaying (in a limited way) relations between issues, scheduling showing due dates, showing watcher and technical lists.
- An auto refresh mode that automatically refreshes the screen at a user selectable interval. Great for keeping up to date on new and changed issues.
- An rss newsfeed to keep you up to date on changes in the tracker.
History of the tracker
As mentioned above this was originally developed many years ago but never deployed. The original full documentation (in html) is located here .The tarball of the original code developed for version 0.7.0 of roundup is located here. Note that this is probably only of historic interest as it will not work with a current version of roundup. You would be better off downloading the current copy from the fossil repository.
Configuration management with Config/DACS
From 2005-2008 "Config" was extensively modified and updated. It is now named DACS - Distribution and Configuration System. In DACS subversion replaces CVS as the source code control system. Plus it has greatly enhanced features compared to the original Config. You should visit its home page at http://www.cs.umb.edu/~rouilj/DACS.
The Config/DACS system started from a paper I wrote for LISA in 1994 with the assistance of Rick Martin. It was titled
It integrated:- A database mechanism to record information about computer configurations
- A version control system to allow:
- tracking/auditing and rollback of changes
- automatic validation of changes
- access control to parts of the configuration tree
- A file generation mechanism driven from the database to create configuration files from standard formats
- A file distribution mechanism
While the original software to implement config can be downloaded from here: http://www.cs.umb.edu/~rouilj/config/config_tools-1.0.tgz. you should be using the newer DACS system.
Real time log analysis using SEC - the simple event correlator
In November 2004 I presented a paper at the USENIX LISA conference titled:I also created a coursebook for a class on SEC that I taught for the LISA 2009 conference. The coursebook is based on tiddlywiki and provides:
- Textbook
- Quizzes
- Presentations
- Student notebook
You can get a stripped down version that I used for a BBLISA presentation from http://www.cs.umb.edu/~rouilj/classes/SEC_bblisa1/SEC_tw.html.
Nagios system monitoring software.
I have two published changes for the Nagios network/application monitoring system:- Addition of advanced correlation capability to Nagios 2 using the Simple Event Correlator
- Patches to the Webinject http testing tool to make it interact better with nagios
Combining SEC and Nagios.
I wrote a patch for nagios to allow it to use SEC - the simple event correlator for finer gained correlation. Nagios has dependencies, however they are limited to using the exit codes for decisions and don't make decisions on the type of error generated. Also nagios flapping service detection never really quite worked for me. I gave a work in progress presentation at LISA 2006. A PDF of the slides and notes is available .Some use cases of this integration are:
- Control over the definition of a flapping service
- Require 4 ok states in a row before rearming/clearing service.
- Different thresholds for a single service. E.G. one between 7AM and 6PM allows two processes to run while outside that range only one process can run.
- Recognize a known error condition and report a different message to make resolving the problem easier.
In addition I did a presentation for Back Bay LISA on January 10, 2006. This covered Nagios/Sec and their integration. The PDF of my slides and notes is available for reading. The other documentation and the unpacked distribution for browsing, including the manual/release notes is also available.
Nagios patch for webinject.
I made two sets of patches to webinject (http://www.webinject.org/) to better support nagios. My patches and descriptive text are located here.TkWatcher
If you are looking for TkWatcher, you have come to the right place. Just click here for its homepage. Tkwatcher is a tcl program that allows monitoring and analysis of program output. It can use any tcl based shell including tclsh, expect, wish, or scotty. It runs with tcl-7.3 and tcl-8.0 and should work with future versions.By default it reports problems by email, but it can also send problem reports to a file or to standard output. The reports can be human or program readable. In addition to these reporting modes, it can log error using external programs to permit syslogging, paging or other real-time notification of errors.
Benefits over Watcher
It was inspired by the program watcher by Kenneth Ingham (not Inghman as mistyped in the docs these many years. Hope to fix this typo in version 1.5, sorry Mr. Ingham), but adds features that I found lacking in the original watcher. Among those features are the ability to:
- select portions of the controlfile
- print command headers in the error messages
- select individual lines from a command output stream using absolute positions, or a regular expression
- perform and test calculations based on the input data
- specify multiple tests on a value that are anded together to determine if a warning should be issued.
- set thresholds for reports when all other tests are positive. This allows the user to set thresholds low enough for problems to be caught early, but excessive noise is eliminated by waiting until enough low threshold tests are passed.
It can Monitor
This tool has been used to:
- monitor disk space
- look for stalled jobs in the print queue
- look for run away processes chewing up cpu time
- monitor swap space changes
- look for problems with network interfaces:
- excessive collision rates
- bad rpc calls
- excessive bad xid's indicating that an nfs server is overloaded
- monitor for swapping and paging activity
- verify operation of the X windows font server
- monitoring ntp network for hosts with excessive deviance from established norms
- monitor for excessive ping round trip time
- verify that required daemons are running on the system
- look for excessive copies of certain programs running on the system
- monitor users who keep software licenses for more than a preset amount of time
- watch for changing ip to ethernet mappings in the arp cache indicating an arp attack.
It permits you to parse any command output line extracting values from the line, and performing tests on these values.
Prior Projects
These are projects that are complete and available for people to use.Personal LOgging Device modifications.
This tool was written by Hal Pomeranz and presented at LISA 93. The abstract reads:PLOD (the Personal LOgging Device) is a simple text interface which allows System Administrators (and others) to keep a record of the work they from day to day. The program was developed in Perl with device independence, flexibility, extensibility, and ease of use in mind. The user-interface is reminiscent of Berkeley mail, complete with many pre-defined tilde-escapes which perform various useful functions. Users may easily extend the program by defining their own personal escape sequences.Plod is a tool for logging your daily tasks. I have modified the version available from Hal's web site. My modified version allows time tracking and assigning (time) and plod notes to particular tasks. I use it for recording the amount of time I spend on reading email, working on a particular trouble ticket, responding to critical incidents etc. The files are:
- a gzipped tar file with the changes applied.
- The master source of plod in shar format in case Hal's site disappears or a newer version of plod is released.
- a patch file to apply to the master source.
- The changelog file for my patch.
plod -T -d `date +%m/%d/%Y`to display the number of minutes spent in each category for today.
function timecard () { plod -T -d $1 -D `date +%m/%d/%Y` $2; plod -d $1 -D `date +%m/%d/%Y` -g : $2 ; }is used to display the timecard (i.e number of minutes per category) and all the log entries between the start date and the current day. Useful for filling out timecards. If the start date and current date are in different months, then this will have to be run twice. E.G. December 1, 2005 falls on a Thursday. So I would run:
timecard 11/30/2005 200511 timecard 12/1/2005on December 2nd to get the time for the week. Sorry about that, I didn't change the code to allow this to work in a single command.
Majordomo
I was a significant contributer to the majordomo mailing list manager in the early 1990's. I was responsible for the current email based configuration as well as the 1.90 through 1.93 (or was it 1.94) releases.Software management with Depot Lite
At the USENIX LISA conference in 1994 I fortunate enough to have a second paper accepted. It was titled: Depot-Lite is a software management, packaging and deployment method. It extends the Depot concept of software management with some additions that are useful in an academic environment where students are allowed to install and publish software for others to use.Cygwin Involvement
I have been using cygwin for many years. It is the one thing that makes windows bearable. I am hosting the latest copy I have of Michael A Chase's clean_setup.zip version 1.0700 from July 02, 2003 because it is a useful tool and there doesn't seem to be another downloadable version on the internet. It used to be at:http://home.ix.netcom.com/~mchase/zip/clean_setup.zipThanks to Angelo Graziosi for providing me with the copy. If you download it from here, please consider posting it on your own web page so that this tool will not once again face extinction. Thanks.
Using ssh and screen together
I use ssh with screen all the time because I work remotely and have my ssh session drop on a regular basis. I also use ssh-agent on my laptop and forward the agent via my ssh session. Now within my screen session, I often ssh to other hosts where I want my ssh-agent to be accessible.When I initially log in it works fine. The SSH_AUTH_SOCK variable is in screen's environment and is inherited by the sessions under screen. When I ssh to other systems, again the SSH_AUTH_SOCK is forwarded along. However after the ssh disconnects and is reconnected (using autossh or manually), the SSH_AUTH_SOCK variable in my screen sessions is pointing to a dead socket. Fixing this for screen sessions on the ssh target host is easy, save the SSH_AUTH_SOCK variable before invoking screen -Dr, and source the new SSH_AUTH_SOCK into the shell running under screen.
However for remote ssh sessions, it is more difficult as we have to redirect their remote SSH_AUTH_SOCK to the newly created socket. To do this, I created ssh_auth_shuffle which is a bash script that combs the environments of open ssh sessions and symbolically links their SSH_AUTH_SOCK's to the newly created socket allowing access to the ssh-agent.
So to recap, the sequence:
- desktop -> access_host
- access_host -> host1
- host1 -> host2
This works fine if you don't use something like screen(1) on access_host. If the network link between your laptop/desktop and access_host breaks, it also tears down all of the other ssh processes and they have to be re-established. When you reestablish the links, the ssh-auth tunnel is automatically connected through all the segments and things work fine.
However if your access_host session runs screen(1) and your desktop disconnects, the rest of the ssh connections survive, but the access_host end of the access_host->host1 ssh link doesn't have a link to the desktop anymore. When you ssh back in, ssh established a new endpoint for the ssh-agent tunnel.
Indeed the shells that run within screen still have the old endpoint that they propagate to new ssh connections. So we need to propagate the new ssh-agent endpoint to the local shell. http://www.deadman.org/sshscreen.html addresses this issue but is not able to fix the loss of agent access for established ssh sessions to other hosts.
A solution is to ssh from your desktop to access_host, use:
ssh access_host 'ssh_auth_shuffle && screen -d -r'The ssh_auth_shuffle script locates all your established ssh commands and figures out where their end of the ssh-auth tunnel is located. It then links that end to the new ssh-auth tunnel created by the ssh from your desktop to access_host. You can install ssh_auth_shuffle anywhere in your path.
Some useful aliases are:
- auth '. ~/.auth_ssh' manually run auth to update the ssh endpoint information in your shell.
- ssh '. ~/.auth_ssh; ssh' makes sure ssh works by pulling the newest ssh endpoint info before executing ssh.
- scp '. ~/.auth_ssh; scp' same as above except for scp.
The file ~/.auth_ssh is a link to the file that stores the current ssh connection parameters. A sample file is:
export SSH_AUTH_SOCK=/tmp/ssh-xNQZo23620/agent.23620 export SSH_CLIENT='::ffff:65.33.255.162 1100 22' export SSH_CONNECTION='::ffff:63.33.222.162 1100 ::ffff:192.168.7.14 22' export SSH_TTY=/dev/pts/7 export DISPLAY=localhost:11.0The values of these environment variables (except SSH_CLIENT) are described in the ssh man page.
In addition the current DISPLAY that is in use by ssh is exported as well so you can use X11 forwarding after reconnecting to access_host although this does not work for host1 or host2.
Note that this script is very Bourne shell centric since the ~/.auth_ssh file uses "export var=value" syntax to set variables. However modifying it to support csh style shells should be pretty straight forward.
Korn shell semaphore implementation
This was originally from:Implementing Semaphores in the Shelland implements a fair queuing semaphore implementation in ksh.
10/6/2004 By Ed Schaefer and John Spurgeon for UNIX Review
Summary: The authors present a Korn shell implementation of a counting semaphore.
I had a couple of problems with it when I used it for controlling resource usage by BackupPC, so I patched it and the original source plus the patch are available here. It is released under GPL V2 as are my patches to it.
I am putting the shell semaphore code here because the original link to the article (and in theory the source code) at: http://unix.ittoolbox.com/documents/implementing-semaphores-in-the-shell-15726 is dead.
A little about me
I have posted my resume (in adobe acrobat (.pdf) format) and will post it in other formats along with my picture here when I get a chance. If I am not working on an ambulance (I've been an emergency medical technician since the mid 80's) I can be found playing Ultimate Frisbee, or giving somebody a massage.I was formerly employed as a system administrator with MathWorks.
Also you may be interested in my profile on LinkedIn.