Tuesday 4 November 2008

Hello, Nagios\n

Nagios is a generic monitoring tool, and I want to use it to monitor my own application. It has a lot of built-in and offical plugins. However, it is always good to try a Hello, World-type example.

Firstly, my very interesting application. It follows the Nagios Plugin guidelines: return a value in {0, 1, 2, 3} and a status message of less than 4k to standard out. And it gives you some random sysadmin-action:

(np.c, compiles into an executable np)

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc, char* argv[]) {
srand(time(NULL));

int r = rand() % 10;

int status = -1;
char* message = NULL;

switch(r){
case 0:
case 1:
case 2:
case 3:
status = 0;
message = "Everything is alright :-)";
break;
case 4:
case 5:
case 6:
status = 1;
message = "You have been warned.";
break;
case 7:
case 8:
status = 2;
message = "She is gonna blow!";
break;
case 9:
status = 3;
message = "Oh shit...";
break;
}

printf("%s\n", message);

return status;
}


Nagios is easy to install with this quickstart guide. You just need to configure it:

1) In $NAGIOS_HOME/etc/nagios.cfg, add a line cfg_file=$NAGIOS_HOME/etc/objects/np.cfg

2) In $NAGIOS_HOME/etc/nagios.cfg, set interval_length to a suitably low number, e.g. 5 seconds, so you can see changes quickly

3) In $NAGIOS_HOME/etc/objects/np.cfg, make it look something like this:

define hostgroup {
hostgroup_name app_hosts
alias Application Hosts
}

define host {
use generic-host
host_name app_host
alias Application Host
address localhost
hostgroups app_hosts
max_check_attempts 10
}

define service {
use local-service
host_name app_host
service_description My Application
check_command monitor_np
normal_check_interval 1
retry_check_interval 1
}

define command {
command_name monitor_np
command_line $APPLICATION_HOME/bin/np
}

The configuration is quite fidgety. Lots of required attributes, no sensible defaults. Luckily you can reuse example configuration that comes with Nagios. And there is a handy pre-flight-check feature, you just run /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg and it will give you errors and warnings against your configuration files, with line numbers.

Anyway, by now Bob is your uncle and you can restart nagios to pick up changes. Then point your browser to http://localhost/nagios and watch the action!

Monday 3 November 2008

Any Fool Can Sysadmin

I have Windows XP on my laptop, but I needed a Linux environment for client work. The last time I played around with dual booting was in 2000 when I had just bought my self-builder home computer. I installed Windows 2000 then RedHat 6, but the whole thing failed (I suspect my then very new motherboard didn't have drivers, or something) and I had to wipe everything (using my home sysadmin skills: if anything goes wrong, wipe and reinstall!).

So, it is fair to say I am a non-sysadmin, and maybe that's why I am sometimes accused of being a girly drinker. But with VMware I am suddenly sucessful and confident - in fact I am so confident I might just run as root - it is a vm after all, what could go wrong?

I found this guide extremely helpful. VMware Player is free and easy, the guide helps to hook things up, Ubuntu is quite user friendly. Why was I so scared of sysadmining for so long?

Well, these are early days, and I might just revert to my old self if I get a problem with Linux - wipe and reinstall. But since it is a vm, that is actually a painless solution...

Actually I forget; I first tried installing Damn Small Linux using a VMware Appliance, since it was only a 50MB download. However, I didn't manage to get the network working. And my colleagues and I didn't have a clue how to fix it. So I ended up with Ubuntu. Morale of the story though, sysadmining is no fun if you can't easily wipe and reinstall.