Mark's Blog

UNIX, Computers, Bass and other things of interest...

Behringer DEQ2496 Review for Bassists

| Comments

I’ve just posted a little review/guide to the Behringer DEQ2496 unit, and how it works in a bass player’s rack. I made this video because I’ve seen lots of clips and reviews of the DEQ2496, but none focusing on using it in a rack setting for a gigging bassist. It offers a lot of useful features such as a graphical/parametric EQ, compressor, limiter, feedback suppressor and more. In this video I introduce the unit, cover basic setup and look at the EQ modules.

Any questions, please leave a comment and I’ll do my best to answer! Part 2 coming soon…

Reboot

| Comments

Well, it was about time. This blog has been stagnating for a long time, partly due to the clunky PHP-based system that used to run it. Needless to say, although I used to do a lot with PHP, I’m now old enough to know better. I’ve therefore done a “rip & replace” update on this site, switching instead to the wonderful, Ruby-based Octopress – a static blogging system built on top of Jekyll.

Besides the warm and fuzzy feeling of booting PHP out and using a lovely Ruby codebase, this also brings other benefits such as a massively reduced amount of effort required to manage this site and create content. Plus, it’s one less bit of software I have to worry about patching and securing.

I’ve ported most of the old site’s content over, although comments haven’t made the transition. I’ve added Disqus comments as a replacement, and will also look at extracting the old comments from the database so no information will be lost. You’ll also have to excuse any graphical glitches; the article’s HTML should be displayed correctly, but I’ve already had to make a couple of manual tweaks here and there, and will fix up things as I go along. Anyway, onwards and upwards!

Avid Eleven Rack Hiss Problem Solved

| Comments

This is a bit of a departure from the rest of my blog posts, as it relates to my main hobby and current interest – Bass guitar and amplification. I play in a band and have spent a lot of time building out my main rig for live shows and rehersals, but I recently ran into a problem with the latest addition. I found a solution to it (and lots of people suffering from the same issue), so I’m posting it here in the hope that it might help someone else.

I had just bought an Avid Eleven Rack modeller to replace a series of utterly unreliable Behringer V-Amps (that’s a story for another day!). It sounded great when hooked up to my computer as a recording device, but I had a major issue when using it through my power amplifier for live use. I was running the XLR balanced outputs to a Behringer DEQ2496 unit for final EQ tweaks[1],  the XLR outputs from that to a Crown "DriveCore" XLS 2000 power amp, and then finally into my two cabs.

However, despite using good quality cables and balanced links all the way through the signal chain, there was always a really annoying hiss present from the XLR outputs, even when I eliminated the Behringer unit and ran straight into the power amp. No matter what I tried on the 11R, this hiss was always there and could be heard when I was playing, and also when there was no input at all to the 11R, regardless of what patch I had selected – it sounded just like the general noise of the unit.

It wasn’t as noticeable on the "Output to Amp" sockets, but then those provide instrument-level signals, and the power amp expects line-level. This meant that if I carried on using those outputs, I had a marginal improvement in sound but at massively reduced volume – so I couldn’t use it live. If I was using a regular combo amp or head then it would have probably worked OK, but I needed to go through my power amp. Also, if I ever ran the outputs direct into a mixing desk, the problem would still persist.

As I mentioned before, I Googled for "Eleven Rack Hiss", and this showed many people having the same problem. I was about to give up on it when I then realised that the hiss didn’t seem to be present when recording (keeping the signal digital). So it seemed to be an issue solely with the analog XLR outputs, which tied in with the results of my Google search.

I then realised that both my R11 and DEQ2496 have digital AES/EBU connectors, so did the following :

  • Set the 11R clock rate to 96Khz (it’s in the User Options menu). Leaving at the default seemed to introduce odd artifacts and bad sound otherwise. Here’s where it is in the 11R’s menu when you hold down the "Edit" button for a few seconds (click any of the images in this post for bigger versions) :


  • Connected the 11R to the DEQ2496 through the AES/EBU ports
  • Set the "Digital Output" option on the R11 (see above image) to "Mirror Analog", so the "Output To Amp" option affects the digital signal as well
  • On the DEQ2496, switched the input on my presets to digital in, and checked they had locked onto the 96Khz signal:


  • Connected the XLR outs from the DEQ2496 to my power amp

Problem solved! I now have to control the master volume via the gain controls on the Crown power amp, but that minor change aside it’s working perfectly. No hiss whatsoever, everything is crystal clear (apart from the usual noise introduced by high-gain patches and so on), even at full bone-shaking volume. The difference is literally night and day!

So, it looks like a general solution is to take a digital output from the 11R, and feed it into a unit (such as a DEQ2496, or a cross-over such as the DCX2496) which has the appropriate outputs for your chosen power amplifier or PA system. I appreciate this probably involves shelling out for an additional bit of equipment – but if you’re planning on using an 11R through a power amplifier or PA for live use, then you may well find this makes all the difference to your sound.

Anyway, hope this helps someone!

For reference, here’s a slightly blurred picture of the rear connections so you can see how it all fits together :



In that picture, red bundled cables are power from the Samson power conditioner, green are MIDI cables, yellow is the XLR output to the power amp and purple is the digital AES signal. The stray ¼" TRS jack hanging out the yellow bundle is there if I ever want to plug the front "Output to Amp" socket into my practice Laney combo.

[1]=Really useful if you want to compensate for the sound of a venue, but don’t want to go through all your patches and tweak things. For instance, my band once booked a really bad rehersal space where my bass would make the room vibrate at a certain pitch. I simply dialed out the bad frequencies on the DEQ2496 and left all my presets as they were; crisis averted! Plus, it’s MIDI controlled so I can switch presets on that and the 11R with my FCB1010 pedal.

Adventures in IPv6 Land

| Comments

I’ve spent the last week experimenting with IPv6; it now means that my whole home network and this website run over IPv6 as well as IPv4. As I’ve spent a while playing with this technology, I thought I’d write my notes up here in the hopes that it will help someone else.

I found that the hardest part of getting my head round IPv6 was forgetting what I previously knew about IPv4 networking. The concepts of NAT, private address space, CIDR subnet masks and so on was getting in the way of me understanding what is ultimately a much simpler system. Let’s face it, the current IPv4 status quo is pretty broken, and we’ve got the Internet this far based on a series of hacks built upon hacks. Sure, it sort of works but it’s pretty ugly – and I think it’s only because we’re so used to IPv4 concepts that I never took a step back and thought about how broken it truly is. 

Needless to say, although the theory should be equally applicable to Windows systems, all this is all written with a heavy Unix-bias as that’s what I use most of them time. Also, if you notice any glaring mistakes or omissions I would be grateful if you’d leave a comment below, and I’ll go back and edit this article ASAP. Click the "Continue reading" link for the full article…

Address format

Here’s the first hurdle : IPv6 addresses look strange at first glance. But they’re really pretty simple: The basic structure of an IPv6 address is a 128-bit hexadecimal string, made up of groups of 16-bit characters, separated by colons. For the sake of brevity, you can omit leading zeros from a field, and when a field is all zeros it can be replaced by two colons – but you can only do this once in an address spec. So, an address of fdce:3916:08df:0000:0000:0000:0000:0001 can be represented by fdce:3916:08df::1. An extreme example of this is the loopback address (the equivalent of 127.0.0.1) – this is shortened to ::1.

IPv6 address subnet masks are also much simpler to understand – part of the address is simply designated as the network prefix. To see how this works, take the example of a unicast address identifying one single host on the internet. These usually consist of a 48-bit network prefix for routing information, a 16-bit subnet identifier and a 64-bit interface identifier to identify the system. So, the address is neatly split into three parts :

rrrr:rrrr:rrrr:ssss:iiii:iiii:iiii:iiii (r = routing identifier, s = subnet identifier, i = interface identifier). 

The number of bits used in a netmask is shown by standard slash notation, e.g.  fe80::211:32ff:fe0f:4a6a/64 shows a 64-bit prefix (fe80:0000:0000:0000 – note the double colon in the original collapsing the repeating zeros) offering a single subnet range (interface identifiers from 0:0:0:0 to ffff:ffff:ffff:ffff). In this case, the interface identifier was 211:32ff:fe0f:4a6a.

The interface identifier is usually always 64 bits long (well, technically you can use some of these fields for subnetting but various things like auto-configuration will break), and will usually stay the same for a host, so networks can easily be reconfigured by simply changing the subnet information. As a further example to see where this may be used: If you are an ISP and have been allocated several /48 ranges, you can then utilise the SLA ID field to carve them up into subnets and give each customer their own /64 range (which they can’t usually split into further subnets). Of course, ISPs will usually be provided with a /32 network, so they could split that up into 65,536 /48 networks and split those into /64s and so on. Read on to see just how massive that address space actually is…

There are also a number of prefixes that have special meanings in an IPv6 network, some of these are :

  • FC00 is the prefix of a site local address. Site local addresses are the equivalent of a private IPv4 address, but as I’ll explain later you probably don’t need to worry about these.
  • FE80 are link-local addresses. More about that later!
  • FF02::1 is an address that multicasts to all nodes on the LAN.
  • FF02::2 is an address that multicasts to all routers on the LAN.

Link local addresses

One of the really nice things about IPv6 is that it "just works", usually right out of the box. If you have an IPv6 enabled system, you’ll probably have a link-local address automatically assigned to your network interface. On a Unix system, just look at a network card with "ifconfig", and you should see an inet6 address assigned alongside your usual IPv4 address, e.g.

eth0      Link encap:Ethernet  HWaddr 00:1c:14:01:3b:fc  
          inet addr:192.168.0.121  Bcast:192.168.0.255  Mask:255.255.255.0
          inet6 addr: fe80::21c:14ff:fe01:3bfc/64 Scope:Link

You can identify this address because it starts with "fe80" (mentioned above). These addresses are needed by some internal IPv6 functions (such as neighbour discovery and so on) and are automatically assigned to any IPv6 interface, using an algorithm based on the card’s MAC address. They are not routable and only work within a network segment, but it does mean that you can in theory plug a bunch of systems into a switch, turn them on, and start transferring data over IPv6 without any configuration. If you have a couple of systems you can experiment with, try pinging each of them using "ping6" and using their link-local address. Note: I had to specify the interface (e.g. ping6 -I eth0 fe80::21c:14ff:fe01:3bfc) on some systems, as I got a "connect: invalid argument" error when the kernel didn’t know which interface to use. 

Private networks

Now, this had me stumped for a while as I was approaching this with my IPv4 head on. I started off looking for the equivalent of the 192.168/172.16/10 ranges in IPv4 as I assumed I would be using NAT on my router as usual. I discovered that there is a rough equivalent of these ranges, called Unique Local Addresses (ULA).  These all start with FD as the first digits and avoid the problem of namespace clashes by creating a pseudo-random range. This means no two sites should be using the same range, which happens all the time with IPv4; if your home network and work VPN both use 192.168.1.0/24, one of you will have to change! 

There’s a tool at http://www.sixxs.net/tools/grh/ula/ which lets you generate these ranges using a MAC address as a "seed" and optionally register them in a database. However, I discovered you probably don’t need to worry about these addresses. The reason is simple – the IPv6 address space is huge. To paraphrase Douglas Adams: "You just won’t believe how vastly hugely mindbogglingly big it is" ! If you are assigned a single /64 network, that gives you 18,446,744,073,709,551,616 addresses to play with. To give you an idea of just how much space there is in the IPv6 internet, check out this reference : http://www.potato-people.com/blog/2009/02/ipv6-subnet-size-reference-table/.

This means that workarounds such as NAT or private address spaces just simply don’t need to exist for the most part. You can easily allocate every one of your systems with a publicly reachable address; you just have to set up appropriate firewall/ACL rules at the router to stop unwanted traffic to/from them. Of course, for truly private networks or topologies which may require a private component (database tier of a web stack for instance) you can go ahead and use a ULA range. They may also come in handy if for some reason you can’t use the link-local addresses, e.g. you only have a public /64 range and want to use subnets, but for a home network you don’t need to worry about it. 

Address autoconfiguration

OK, so you have a network range allocated to you. How do you configure the rest of the hosts on your network ? You have two options : RADVD and possibly also DHCPv6. RADVD is a "router advertisment" daemon and lets the rest of the systems on your network know what routes they should use, and can also allocate addresses. It’s very simple to configure; for example on a Linux system running as an IPv6 gateway your radvd.conf would look something like :

interface eth0
{
    AdvSendAdvert on;
    AdvLinkMTU 1280;
    MaxRtrAdvInterval 300;
    prefix 1234:5678:9abc:de::/64
    {
            AdvOnLink on;
            AdvAutonomous on;
    };
};

Your systems will then generate an IPv6 address based on this prefix and a unique interface identifier (again based on MAC address) – so, something like 1234:5678:9abc:de:211:32ff:f20f:4d6a. In a small home network this should suffice quite nicely: All your systems will obtain an IPv6 address from your range (along with their link local address) and know how to route out of your network. Assuming you have a functioning IPv4 stack and DNS etc. configured already (which will probably be the case for the foreseeable future), this could be all you need as long as your nameservers can return IPv6 AAAA records.

For any other kind of IPv6 configuration, you will need to look into DHCPv6 which is a totally different protocol and software stack to IPv4 DHCP, but performs the same function: handing out addresses, configuring name servers, domain name, time servers and so on. 

Command line tools

As IPv6 and supporting protocols such as ICMPv6 are so different from IPv4, a whole bunch of new tools are needed to manage it. In many cases, these tools function more or less the same as their IPv4 counterparts, and are identifiable with the "6" in their name. For example, ping6, traceroute6, ip6tables and so on. In addition, tools such as tcpdump can show IPv6 traffic simply by using the "ip6" filter. These should present few surprises, but where it gets interesting is when it comes to configuring addresses and routes. 

It turns out that the "ifconfig" tool is now deprecated, so although you can view interfaces and IPv6 addresses with it, you should use the "ip" tool under Linux for anything else. I’ll cover other Unix like systems (Solaris, BSD and so on) in another post. Here’s a quick cheatsheet with some examples :

  • Add an address to the eth0 interface : ip -6 addr add <address>/<prefix length> dev eth0
  • Remove an address from the eth1 interface : ip -6 addr del <address>/<prefix length> dev eth1
  • Show all IPv6 addresses : ip -6 addr
  • Show IPv6 neighbours (equivalent to the ARP table in IPv4) : ip -6 neigh
  • Define a default route for eth0 : ip -6 route add default via <gateway address> dev eth0
  • Look up the IPv6 DNS record for a website : dig -t aaaa www.markround.com

Note that the object "ip" operates on (addr,neigh,route etc.) can also be abbreviated. So, "ip -6 -a" will show all IPv6 addresses. 

Anyway, that’s the end of this quick crash course. I know I’ve barely scratched the surface, but sometime later I’ll give some practical examples and detail how I configured an IPv6 tunnel for my home broadband connection, configured my systems (Mac, Windows, Linux, BSD, Solaris, other Unixes and smartphones) and also what I had to do to get this website running over IPv6. As I mentioned before, if you have any comments, feedback or corrections please feel free to either reply in the comments section below or email me at ipv6 [at] mark round [dot] com.

SGI Irix Packages

| Comments

I have finally got a working build environment for my SGI IRIX systems (an R14k Fuel and a dual R12k Ocatane2) and have packaged some open-source software for the fantastic Nekoware project. If you’re a fan of classic Unix systems, I strongly recommend heading over to their forums – there’s also a pretty strong Sun and HP contingent there among the SGI fanatics!

Anyway – the two packages I have built so far are the fantastic pv (Pipe Viewer) tool and Mercurial DVCS.  PV is a fantastic utility that can be used to replace "cat", and displays a progress bar on stderr. See the overview for some examples of what you can do with it. Mercurial should need no introduction; I just had to make a couple of minor patches (included in the tardist and submitted upstream).  I’ve tested both local-only repositories, as well as pushing/cloning/pulling from remote HTTP sites. The only problem I have found is that accessing SSL-enabled repositories produces warnings, due to the old version of Python in Nekoware (2.5). Apart from that, it seems to work great – and the projects do seem to get checked out, it just warns you it can’t check the certificate.

Any feedback is always gratefully received; I’m sure there are still some IRIX users out there :-)

Solaris 11 Review

| Comments

I’ve finally had the chance to devote some time to experimenting with some of the new features in Solaris 11. This article is really just intended as a walk-through of my first few weeks using Solaris 11 - a "kick of the tyres", so to speak. There is far too much that is new for me to cover everything, so I’ll be adding to this article and updating this site as I go through it. I’m also assuming the reader is familiar with Solaris 10; if you feel some parts need clarification, or if I’ve skipped over something you’d particularly like covered, feel free to let me know!

Download

I am a little unclear as to the new licensing restrictions around Solaris 11. My understanding (Caveat: I Am Not A Lawyer™) is that it is free to use for personal and non-commercial purposes, but anything after a 30-day trial period must be licensed if you intend to use it for any kind of commercial purposes - this includes development and testing environments. You also do not get access to patches or software updates without a support contract; sadly that now includes things like BIOS and firmware updates that used to be freely available in the Sun days. All part of the new regime, I suppose - we all have to get used to contributing to Larry’s yacht fund now.

Heading on over to Oracle’s online store reveals that a "Oracle Solaris Premier Subscription for Non-Oracle Hardware (1-4 socket server)" starts at £672.00, which does compare favourably with Red Hat Linux. Excluding the 2-socket tier, an equivalent 4-socket Red Hat license would set you back around £1,000 and only includes a license for 1 virtual machine. More details of what’s included in the support offering are at http://www.oracle.com/us/support/systems/operating-systems/index.html. Update : An anonymous reader provides some clarification - it looks like it may not be such a great deal after all : 

The list price comparison to RHEL intrigued me.  I think the Solaris price is higher than £672/$1000 for the 4 socket example you’re giving as according to the Oracle store description page for the 1-4 socket non-Oracle option:  "Please note, this subscription is based on the number of sockets in the system you need to support, when ordering enter the number of sockets in the quantity field."

So that’d be £672 * 4 = £2688 (or $4000).  I’m assuming premier is the same sort of service + SLAs on both.  The equivalent to the single socket £672/$1000 subscription would be the RHEL 2-socket premium subscription at $1299/yr.  Hopefully I’m not missing anything here.

I would be interested to hear of any experiences of Oracle’s support when using non-Oracle hardware, as to date (apart from some non-production environments running on HP ProLiant systems) everything I have run Solaris on has been a Sun/Oracle SPARC or x64 system, and the OS support was included under a larger company support contract. Update 2 : There’s some experience of Solaris on HP kit in the comments below. 

Anyway, the first step is to download the software and unlike previous Solaris releases, there’s now a variety of different installation media so you have to pick the correct one for your needs. The available downloads are :

* Text Install : This is very similar to the old Solaris text-mode installs (SPARC and x86) and even has the same colour-scheme and "F2_Continue" shortcuts down the bottom. Takes me right back to installing Solaris 8 on old Pentium systems!

* Automated Installer : This provides a "hands-free" network installation system, and replaces the old Jumpstart system. You need to have your own IPS repository (more on that later) set up, or have access to the Internet so you can reach Oracle’s IPS repository.

* Live Media : This is only available for x86 systems, and is very similar to the Linux "live environments" on Ubuntu and Fedora etc. It lets you run the system off the CD and experiment with it before actually installing it. It’s pretty slow and you’ll need a lot of memory so I personally didn’t find it of much use other than to check hardware compatibility and so on.

* Repository Image : Unlike previous Solaris releases, the installation media does not contain all available packages. Instead, it contains a smaller subset of software which will allow you to get a basic system up and running. After that, you need to connect to Oracle’s pkg.oracle.com server to download other packages, or use this image to either setup a local IPS server on your network (or mount it and use it as a local repository).

* USB Install Images : Again, only available for x86. I didn’t test this out as I didn’t have a need for it, but it would be a useful addition to the Solaris Sysadmin’s toolbox.

* Virtual Machine Downloads : These are VM images that can be imported directly into a variety of hypervisors - could be useful for getting started quickly, but most admins will either be using the text or automated installers.

There’s also a "Oracle Solaris 11 Preflight Application Checker" available, which checks an application running on Solaris 10 and indicates whether it should run without problems on Solaris 11. However, given the Solaris binary compatibility guarantee it’s unlikely you’d encounter problems. In any case, you could always run a Solaris 10-branded zone under Solaris 11 - indeed, creating a Solaris 10 zone from a running system (using flash images) is a supported configuration and process.

Installation

I grabbed the x86 text installation images and installed a system using Oracle Virtualbox. As with Solaris 10, the boot menu (grub) lets you choose an install over a local terminal or serial ports. The latter option is particularly useful in an environment using Sun/Oracle servers without a graphics card - while you can use iLOM/ALOM, going straight to the serial port is much faster.


The setup starts off much the same as Solaris 7 onwards, with the usual region/keyboard selection, and then you get presented with a new menu :

1 Install Oracle Solaris
2 Install Additional Drivers
3 Shell
4 Terminal Type (currently sun-color)
5 Reboot

Choosing option 3 (Shell) drops you into a basic rescue-like environment, and typing "exit" (or hitting Ctrl-D) at any point returns you to menu. This environment is extremely useful for performing emergency maintenence or recover tasks and while you could always use "boot -s" from older Solaris installation media, it’s is a very welcome addition to have it so easily available. In this shell environment "svcadm enable ssh" starts a SSH server, and if you create a new user (you can’t login as root), you can also login over network.


The next screen that has noticably changed is the disks screen. UFS has been completely removed as an option, so instead of choosing your slices for filesystems, you now only get to pick which slice whould be used for the ZFS root pool. It is recommended that you use the entire disk/LUN where possible, as this lets ZFS make much better use of the underlying volume (see http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide for more information).


There is no choice as to ZFS filesystem layout, although as it can so easily be modified it really doesn’t matter. Whatever slice or LUN you select is assigned to the "rpool" volume.


After the network configuration (which has changed slightly but still provides the same choices of "DHCP, Manually or None"), you reach the User creation step. This again is noticably different: Firstly, it now enforces password complexity. Secondly, you have to create a new user account as well. The traditional root account is now a role, and you can’t login directly as root. Like most Linux distros, the first user account created has the ability to use sudo/RBAC to work as root when needed. 


After this step, the installation starts in earnest. The install is very quick, and after the slow initial reboot (when it loads smf service descriptions), you can login. 

Click the "continue reading" link for the rest of the review…

First boot

Boot output is very minimal by default, after the grub menu you just get:

SunOS Release 5.11 Version 11.0 64-bit
Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved.
Loading smf(5) service descriptions: 191/191
Configuring devices.
Hostname: solaris
solaris console login:

And then unless you have set up a FQDN, you’ll see the usual sendmail barfing. Seeing as I always kill sendmail first thing (seriously, why do we have to put up with sendmail in this day and age?), that’s not a problem. My first step with a new Solaris system is to enable verbose booting - you can modify this on an x86 system by changing /rpool/boot/grub/menu.lst kernel line as follows :

kernel$ /platform/i86pc/kernel/amd64/unix -B $ZFS-BOOTFS -v -m verbose

This provides much more verbose kernel messages, as well as showing each SMF service as it comes up. 


Logging in shows the first big change in the userland - bash is now default shell :

Oracle Corporation      SunOS 5.11      11.0    November 2011
mark@solaris:~$ uname -a
SunOS solaris 5.11 11.0 i86pc i386 i86pc

Even if you’re a hardened csh or zsh fan, it can’t be denied that bash is pretty much the standard across a mixed Unix/Linux environment these days, and as an interactive shell is a massive improvement over /bin/sh.


As mentioned above, the first user created has full Sudo access, as well as a pretty comprehensive set of RBAC authorisations :

mark@solaris:~$ sudo -l
Password: 
User mark may run the following commands on this host:
    (ALL) ALL

For full pfexec permissions (the default doesn’t allow you to disable services, for instance) you can re-add the "Primary Administrator" profile as per older releases and then pfexec becomes funtionally equivalent to sudo :

echo 'Primary Administrator:suser:cmd:::*:uid=0;gid=0' >> /etc/security/exec_attr
echo 'Primary Administrator:::All administrative tasks:auths=solaris.*;solaris.grant;help=RtPriAdmin.html' >> /etc/security/prof_attr
sudo usermod -P "Primary Administrator" mark

And then I now have all available authorisations :

mark@solaris:~$ auths
solaris.*

Filesystems

The default layout is separate /, /var, /tmp (swap-backed as usual), /export, /export/home and individual homes, all mounted under /home :

mark@solaris:~$ df -h
Filesystem             Size   Used  Available Capacity  Mounted on
rpool/ROOT/solaris      20G   1.5G        16G     9%    /
/devices                 0K     0K         0K     0%    /devices
/dev                     0K     0K         0K     0%    /dev
ctfs                     0K     0K         0K     0%    /system/contract
proc                     0K     0K         0K     0%    /proc
mnttab                   0K     0K         0K     0%    /etc/mnttab
swap                   1.1G   1.3M       1.1G     1%    /system/volatile
objfs                    0K     0K         0K     0%    /system/object
sharefs                  0K     0K         0K     0%    /etc/dfs/sharetab
/usr/lib/libc/libc_hwcap1.so.1
                        18G   1.5G        16G     9%    /lib/libc.so.1
fd                       0K     0K         0K     0%    /dev/fd
rpool/ROOT/solaris/var
                        20G   215M        16G     2%    /var
swap                   1.1G     0K       1.1G     0%    /tmp
rpool/export            20G    32K        16G     1%    /export
rpool/export/home       20G    32K        16G     1%    /export/home
rpool/export/home/mark
                        20G    34K        16G     1%    /export/home/mark
rpool                   20G    39K        16G     1%    /rpool
/export/home/mark       16G    34K        16G     1%    /home/mark

This is a very sensible layout choice, and thanks to the flexibility of ZFS can be modified easily if needed. It looks like dedup, compression etc. are off by default for each filesystem, but can obviously be turned on if needed. It’s using the default ZFS block size of 128K, and as with all ZFS systems you really need plenty of memory: 

$ echo "::memstat" | sudo mdb -k | egrep "ZFS|Summary|^-"
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
ZFS File Data              117824               460   45%

The ZFS ARC uses most of your free memory and while it should relinquish it if needed, experience shows it’s not optimal in some situations and can lead to thrashing. My experience is that you need a minimum of 2Gb for practical usage - while some documentation indicates 1Gb is the minimum, 2Gb and above leads to a much more usable system. Of course, you can also limit the ARC to a fixed size using zfs_arc_max, which is might be a good idea if you are running a memory-intensive application and know it’s average working set size. If you’re migrating from a previous Solaris release and still need to attach to legacy volumes, SVM is also available but it’s not installed by default; you need to add it from a package repository.


Sadly, the ZFS webconsole (which was one of my favourite little-known features of Solaris 10, and turned a Solaris install into a fantastic storage server) is no longer present. This is presumably due to Oracle selling dedicated ZFS appliances.

The userland

I’m pleased to see compared to the horrors of past Solaris releases, a NMAP run against a newly installed system system shows minimal ports open, just SSH and portmapper.


The userland feels nicely updated, both in terms of the included packages and updates to the included utilities. For instance, there’s a set of GNU coreutils under /usr/gnu, and now even /usr/bin/ls supports the –color=<when> argument, and tar also natively supports the -z and -j options for Gzip and Bzip compression - finally, no need to pipe it through gunzip first! 


Sendmail is still there, and there’s no Postfix or Exim available in the official repositories so you’ll need to head off to OpenCSW/SunFreeware, or install your own packages if you want to replace it.


There’s a full install of Apache 2.2.20 under /usr/apache/2.2 and PHP can be easily added from the official repositories. Java 1.6 is provided along with a JDK and Perl is at version 5.12.3. SFW now looks as though it’s been merged into the main system, as all the files under /usr/sfw/bin are now symlinks pointing to /usr/bin. This means you also get tools like gtar, gmake and ncftp available by default.

Configuration

There are massive changes to how configuration is now handled in Solaris 11. I can’t stress enough how big a change this is - it seems like Oracle are trying to move administrators away from the traditional Unix approach of directly editing config files, and instead using tools to manipulate them. In many cases the config files are still there, but are either backing stores for configuration information, or are dynamically generated by the SMF framework. In this way, it starts to feel more like AIX (or, to a lesser extent, newer Windows Server products) and has massive implications on the way you manage your systems. It means that there is plenty of room for confusion and a lot of 3rd party management tools (including configuration management systems like CfEngine, Puppet or Chef) will need re-working to handle the new way of doing things.


A good example of this is network configuration : In Solaris 11 the /etc/resolv.conf is automatically populated by the svc:/network/dns/client service and manual edits will be lost when the service is restarted. To set values for the nameservers and other information, you must use svccfg now :

$ sudo svccfg -s dns/client
svc:/network/dns/client> setprop config/nameserver = (192.168.16.61)
svc:/network/dns/client> listprop config
config                      application        
config/value_authorization astring     solaris.smf.value.name-service.dns.client
config/domain              astring     example.net
config/nameserver          net_address 192.168.16.61
svc:/network/dns/client> exit
$ sudo svcadm refresh dns/client

Nsswitch is also handled through svcs. So when you want to use DNS, instead of copying "nsswitch.dns" to "nsswitch.conf", you again need to use svccfg :

$ sudo svccfg -s name-service/switch
svc:/system/name-service/switch> setprop config/default = files
svc:/system/name-service/switch> setprop config/host = "files dns"
svc:/system/name-service/switch> exit
$ sudo svcadm refresh name-service/switch

Network interface names have changed - by default, Solaris 11 now uses "net0" and so on instead of the driver-specific e1000g0, bge0 and others. System ID configuration (hostname) is also in SMF.  The old methods of using /etc/hostname.<NIC>, /etc/defaultrouter and so on no longer work, and even ifconfig is considered deprecated. Now, everything is done through two utilities : "dladm" which was present in Solaris 10 and handles data-links (ethernet, infiniband and others) bonding, bridging and so on; and ipadm which handles IP addressing on top of the links : 

$ dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         1000   full      e1000g0
$ ipadm show-addr
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
net0/_b           dhcp     ok           192.168.16.237/24
lo0/v6            static   ok           ::1/128
net0/_a           addrconf ok           fe80::a00:27ff:fea1:e305/10

For instance, to create a new static IP address on a persistent interface which persists across boots, you’d do something like :

sudo ipadm create-addr -T static -a 192.168.1.23/24 net0/static

However, if the interface is temporary (e.g. a DHCP interface), you’ll need to delete it and re-create it - seehttp://docs.oracle.com/cd/E23824_01/html/821-1458/gljtt.html for an overview, as this seriously tripped me up to start with.


In short, be prepared for several days of serious culture shock, and a LOT of reading through http://docs.oracle.com/cd/E23824_01/html/821-1458/toc.html and related documentation at http://www.oracle.com/technetwork/server-storage/solaris11/documentation/index.html

IPS

This is another massive change, although it is one that should have most Solaris administrators dancing for joy. The Image Packaging System completely replaces the old SysV software installation process, and patching. In older Solaris releases, patches were managed separately from package installation; they contained fixes for problems and only included parts of packages. This used to lead to all kinds of problems, and was my main complaint with Solaris. For instance, if you installed a patch but then later installed a package from the installation media that the patch had a fix for, you’d need to re-install the patch. 


Add to this the tendancy for seemingly unrelated interactions when patching (e.g. kernel patches somehow stomping on sendmail configurations), and many sites only applying fixes when needed thereby creating a massive chain of dependencies for future patches - which typically only surfaced when you had some emergency problem and needed to apply a patch in a hurry - you had a nightmare of a system. 


IPS aims (and succeeds) in solving all of these problems as well as tying incredibly well into ZFS snapshots to provide you with a far more consistent and easy to use software installation, patching, and rollback process. However, the old pkg tools are available and you can still install SysV packages so you can add OpenCSW / Sun Freeware and other packages as needed. There is a good set of documentation at http://www.oracle.com/technetwork/server-storage/solaris11/technologies/ips-323421.html.


If you are familiar with Linux systems, a reasonable analogy is that IPS fills the same kind of role that YUM or APT do in the Red Hat or Debian world.http://pkg.oracle.com/ is the official Oracle repository, and it has a web interface that lets you search for software. If you have a support contract, you also gain access to the updates repository which contains newer versions of the bundled software with patches and bug fixes. 


However, I encountered frequent network errors and stalls when using it :

~$ pkg search ruby
pkg: Some repositories failed to respond appropriately:
solaris:
Framework error: code: 56 reason: Recv failure: Connection reset by peer
URL: 'http://pkg.oracle.com/solaris/release/solaris/search/1/False_2_None_None_%3A%3A%3Aruby'.

This may have been caused by transient network errors or a problem at Oracle’s end, but it still seems like good advertisement for having local repository especially as others seem to have reported the issue a few times in the past. I can only hope that the situation improves with time. 


Fortunately, setting up a local repository is a pretty simple affair. You download 2 ISOs, and concatanete them to form a dual-layer DVD image :

$ cat sol-11-1111-repo-full.iso-a \
  sol-11-1111-repo-full.iso-b > sol-11-1111-repo-full.iso
This results in a 6.1G file which you can either burn to DVD or mount using lofiadm. Once you have it mounted, you can set up a pkg server for the rest of your network (see the README included for details), but I instead added them as a local repository just for use by this one system:

$ sudo zfs create rpool/export/pkgs
$ sudo lofiadm -a sol-11-1111-repo-full.iso
$ mkdir /tmp/pkgs
$ sudo mount -F hsfs /dev/lofi/1 /tmp/pkgs
$ sudo rsync -aP --progress /tmp/pkgs/ /export/pkgs/
$ sudo umount /tmp/pkgs
$ sudo lofiadm -d /dev/lofi/1

I then had a full copy of the packages under the ZFS filesystem /export/pkgs, which I could also add compression to in order to save space. I then removed all of the default package "origins" and mirrors for a publisher, and added a new origin pointing to my local repository :

$ sudo pkg set-publisher -G '*' -M '*' -g file:///export/pkgs/repo solaris
$ sudo pkg refresh --full

I could then can install additional software, e.g.

$ sudo pkg install ruby-1.8
$ sudo pkg install php-52 php-apc php-idn \
  php-memcache php-mysql php-pear php-pgsql \
  php-suhosin php-tcpwrap php-xdebug
There’s a pretty complete "AMP" stack available as well as supporting technologies :

* PHP 5.2.17 and full set of extensions including Suhosin and APC * Ruby 1.8.7 (and jRuby 1.1.3) * Tomcat 6.0.33 * Squid 3.1.8 * MySQL 5.1.37 * Webmin 1.510 * Lighttpd 1.4.23

Surprisingly, the bundled version of MySQL is old (5.1) although newer versions are available through MySQL.com and SFW/CSW. For compiling your own software, GCC 3.4.3 and 4.5.2 are provided although Oracle Workshop compilers are free and arguably produce much better code, particularly on Sparc.

Patching

This is the area where IPS shines. To start with, there are now no longer separate patches - you simply run "pkg update", and your entire system is brought up to date. So far, so good. However, the magic really starts when IPS brings together ZFS snapshots and boot environments - it leaves comparable systems like YUM in the dust. 


For instance, suppose you were installing a new package, or upgrading your system. You can create a snapshot of your current boot environment prior to making the changes for rollback purposes, or clone your current BE and install the packages into the clone, rebooting to make the changes active. In this way, it’s similar to the old LiveUpgrade system but rather than requiring separate disks (or temporarily splitting mirrors), it’s now all handled with ZFS snapshots and happens more or less instantly with minimal administrator overhead. Snapshots have practically zero overhead, and there’s really no reason not to take full advantage of this. 


As an example, here’s how you would create a new boot environment and install a new package into it :


$ sudo pkg install --be-name gcc-install gcc-45
           Packages to install:   4
       Create boot environment: Yes
Create backup boot environment:  No
DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  4/4   1030/1030  123.6/123.6
PHASE                                        ACTIONS
Install Phase                              1270/1270
PHASE                                          ITEMS
Package State Update Phase                       4/4
Image State Update Phase                         2/2
A clone of solaris exists and has been updated and activated.
On the next boot the Boot Environment gcc-install will be
mounted on '/'.  Reboot when ready to switch to this updated BE.
$ beadm list
BE          Active Mountpoint Space  Policy Created          
--          ------ ---------- -----  ------ -------          
gcc-install R      -          2.97G  static 2012-03-07 16:16 
solaris     N      /          143.0K static 2012-03-07 10:24

This process cloned the currently active "solaris" BE and named it "gcc-install". It then installed the packages into it, and will be activated when the system reboots. 


If you want to create a snapshot of your current environment prior to making changes you can always do it manually with a simple "beadm create", and other filesystems or ZFS pools can be snapshotted as usual.

Zones

As you’d expect, Solaris Zones (container-based virtualisation) have been tightly integrated with the new features and in doing so have gained some really nice new features. Possibly the biggest change is the new "crossbow" networking stack which allows you to make use of fine-grained network utilisation policies.  Now, when you create and boot a zone you’ll see a virtual NIC (VNIC) created :

$ sudo dladm show-vnic
LINK                OVER         SPEED  MACADDRESS        MACADDRTYPE       VID
myzone/net0         net0         1000   2:8:20:24:b9:d2   random            0

This VNIC can be modified using zonecfg for the changes to take place persistently, although you can use dladm in the globalzone to modify settings on the fly. As an example, here’s a quick session where I reduced the maximum bandwidth available to this VNIC to 100Mb/s, as well as a "help" listing showing the available parameters :

zonecfg:myzone> select anet linkname=net0
zonecfg:myzone:anet> help
The 'anet' resource scope is used to configure a virtual datalink that will automatically be added to the zone.
Valid commands:
        set linkname=<datalink name>
        set lower-link=<datalink name>
        set allowed-address=<IP-address>,...
        set defrouter=<IP-address>,...
set defrouter is valid if the allowed-address property is set, otherwise it must not be set
        set allowed-dhcp-cids=<client-ID or DUID>,...
        set link-protection=<comma-separated list of protections>
        set mac-address=<mac-address>
        set mac-prefix=<mac-prefix>
        set mac-slot=<mac-slot>
        set vlan-id=<vlan-id>
        set priority=<high|medium|low>
        set rxrings=<Number of receive rings>
        set txrings=<Number of transmit rings>
        set mtu=<mtu>
        set maxbw=<full duplex bandwidth of the link>
zonecfg:myzone:anet> set maxbw=100M

And then after commiting this change and rebooting the zone :

$ sudo dladm show-vnic
LINK                OVER         SPEED  MACADDRESS        MACADDRTYPE       VID
myzone/net0         net0         100    2:8:20:24:b9:d2   random            0

It’s worth pointing out that although VNICs will mainly be encountered in zones, there is no reason you have to use zones to make use of them. This means you could for instance create a new VNIC on your system, and then bind your Apache webserver to it for instance. 


Also, a small but welcome change is that you can now run a NFS server inside a non-global zone!

Desktop

From a text-only install of Solaris 11, you can add the GNOME 2-based desktop environment using "sudo pkg install solaris-desktop". There’s not much to say here really, if you’ve used GNOME 2.x, it won’t come as much of a shock to you. There are a couple of nice additions including a SMF applet and related control panel. There’s also a GUI package manager, firewall manager and automated  snapshot system called "Time Slider" integrated into Nautilus - think of Apple’s Time Machine, but built on ZFS. 


SMF services under GNOME

I can’t honestly say I used the desktop much other than out of idle curiosity; while Solaris 11 could be a useful development workstation OS, I doubt it’ll be winning any new converts over from Linux or the BSDs. Phoronix do have a good overview of the desktop though, if that sort of thing interests you : http://www.phoronix.com/scan.php?page=article&item=oracle_solaris_11&num=1


Wrapup (for now)

There’s a lot more to experiment with including the new load balancer and COMSTAR storage stack, which turns a Solaris 11 system into a storage target, capable of exporting ZFS volumes over pretty much any protocol and transport imaginable - iSCSI, FC, FCoE, Infiniband, Ethernet… Being the huge storage geek, this is obviously of particular interest to me! As time goes on and I’ve had a chance to go over some more of this stuff and integrate it properly into my test environment, I’ll update this site with my progress.


So, based on my short time with Solaris 11 so far, I’d say I’m impressed and that it’s a solid upgrade from Solaris 10. Although the loss of individual configuration files and the traditional "unix-way" of doing things is a culture shock, in most cases there are sound reasons for doing so. IP and data-link configuration is the obvious area in which the old methods were rapidly becoming clunky and difficult to manage; dladm and ipadm provide a much more consistent and powerful toolset. 


IPS combined with ZFS snapshots means greatly reduced downtime and failsafe updating; all my old issues with Solaris patching and package management have been made completely redundant thanks to this system. While some of the other features in Solaris 11 can be pretty much viewed as incremental updates, this is truly revolutionary. Sadly, due to the licensing changes and the fact that Oracle have pretty much managed to kill off the community around OpenSolaris I can’t see it winning many converts from established Linux shops, but anyone already running Solaris 10 (or previous releases) should find plenty here to compel an upgrade plan…

Citrix XenServer 5.6 Review

| Comments

Introduction

I’ve been using and evaluating Citrix XenServer now for a while, and felt I should really post a review. I haven’t seen much detailed coverage of this product at the level I’m interested in, so what follows is my take on it from a Unix Sysadmin’s perspective. There won’t be any funky screenshots or graphics; instead, I tried to cover the sort of things I wanted to know about when I was looking at it as a candidate for our virtualization solution at work.  

After all, implementing a new hypervisor is a big step, and a decision that you’ll likely be stuck with for a long time. If there’s anything else you’d like to know, just post in the comments section and I’ll do my best to answer.

As some background: I’ve been using the open source Xen hypervisor as a virtualization platform, alongside VMware for Windows hosts for a good few years now at work. Part of the reason for picking Xen was that it was the standard on the systems I inherited, and also it was free and well-supported on most Linux distributions at the time. To date, I have been using CentOS as a Dom0 – as it’s a free "clone" of Red Hat Enterprise Linux, it follows the same support schedules (up to 2014 for RHEL/CentOS 5.x) and is supported by pretty much every hardware vendor out there. It also has the libvirt tools built into it, as well as up to date packages for storage infrastructure such as DRBD and open-iscsi. It’s well supported, and even though it is a conservative "stable" distro, point releases occur regularly with back-ported drivers and user-land updates.

With some work, you can roll your own management tools and scripts, and end up with a very flexible solution. However, it lacks some management ease of use, particularly for other systems administrators who may not be totally comfortable in a Linux environment. We also wanted to standardise on one virtualization platform if possible, and this all coincided nicely with a planned upgrade/migration off the VMware stack.

XenServer therefore presents a very attractive proposition: A well known, widely tested and supported open source hypervisor, with a superior management stack. The basic product is free, although support and enterprise features are available for a price. The prices for the advanced features are very reasonable, all the more so when you compare against VMware’s offerings. Also consider that the free product allows you to connect to a wide range of networked storage systems and includes live migration, something that the free ESXi doesn’t offer.

All of what follows covers the freely downloadable XenServer 5.6; Both Dell and HP offer embedded versions for some of their servers, however running and managing these systems should be near enough identical apart from the installation steps.

Update : Just after writing this, the beta of "FP1" (an update to XenServer 5.6) was announced. Full details of what will be in this update are here in the release notes. It looks like there will be plenty of significant improvements across all areas (including MPP RDAC, scheduled backups, supported jumbo frames, on-line coalescing of snapshot space and various other things of particular interest to me). Bear in mind when reading this review, that many of the little issues I have with XenServer may well be resolved in the upcoming version, and other areas may be totally overhauled. As soon as the final version is released I’ll post a full update…

Update 2 : FP1 is indeed a big improvement. I’ve been using it in production now for a few months and should have an update soon, covering the new features such as the distributed switch, self-service portal etc.

Click the "Continue reading" link for the full review.

Installation and Drivers

Installation is fortunately a snap. It’s a very simple text-driven affair that gives you very little control over the process, which is exactly what you want. After setting various system parameters such as locale and networking details, you can install additional "supplemental packs". One of these is provided as an option from the Citrix download page along with XenServer – it provides various Linux guest related tools and sets up a demo Debian etch template. Dell have also released a supplemental pack which sets up OpenManage and related hardware monitoring tools, which is a nice touch as it saves you the hassle of having to set this all up manually post-install.

One notable exception to the installation process is software RAID – there is no facility to set this up whatsoever. True, it is possible to set this up yourself afterwards if you are familiar with mdraid and LVM, but it’s totally unsupported. You really do need to use a hardware RAID controller for your boot volumes or boot from a SAN, if that’s an option.

As the control domain (Dom0) is based on CentOS, hardware and driver support is therefore identical to Red Hat Linux: In general, any recent server from any of the main vendors should present few difficulties, as long as it has a 64-bit CPU. In fact, if you are running PV Linux VMs, you don’t even need hardware virtualization support (such as Intel VT). The one exception that you really need to check carefully against the HCL is your SAN block storage.

If your array works out of the box with DM-Multipath (assuming you’re using multipathing; although you’d be mad not to in a production environment), then setup should be straight forward. If you are using something else like MPP-RDAC (such as on the Dell MD3000i or various Sun and IBM arrays), then you will have to customize your system a little more. I also experienced a problem with the supported Dell MD3220i array – XenServer tries to log into all available targets that an array presents. As the MD3220i has 4 active ports per controller, XenServer has to be able to reach them all. Originally, I had intended to only present 2 ports per controller to each server, but this meant I had to rethink my storage network.

In short – you’ll need to check it all thoroughly before you go into production, but as it’s a free download, you should be able to run all the tests you need before shelling out: as mentioned above, the free version is not limited when it comes to storage networking.

Booting and Management

Once installed, the boot process is similarly restricted. EXTLINUX (not GRUB) boots straight into the system, you get a white screen with the Citrix logo and a progress bar, and you don’t then see anything else until you are presented with the management console. You can switch to alternate terminals as with any other Linux distribution, but you won’t see much. The text-mode management console provides you with a few basic functions, such as the ability to reconfigure networking and storage, start/stop/migrate VMs, backup/restore metadata, and perform basic diagnostics.

Dropping to the command line reveals a 32-bit Dom0 based on CentOS 5. In fact, the CentOS repositories are all ready-configured in /etc/yum.repos.d, although by default they are disabled. What this means is that you can install any software on your Dom0 as you would any "regular" CentOS host. Whilst this is generally A Bad Idea in practice (your Dom0 should be doing as little as possible, not to mention that if anything does go wrong you may be unsupported), it does mean that you can install management and monitoring utilities for your RAID controllers, system management agents such as Dell’s OpenManage or HP’s Insight Manager, as well as Nagios plugins and Cacti scripts (my own iostat templates work fine!). Having such a full-featured Dom0 is tremendously useful, and a real advantage over ESXi which lacks a proper console.

Pretty much every other aspect of XenServer is managed through the XenCenter console, or the "xe" command line tool, which is also the same "xe" configuration tool in the open source "Xen Cloud Platform". XenCenter is a .Net application, and unfortunately only runs on Windows hosts although an open-source clone written in Python is available. A nice touch to the xe CLI tool is that all the parameters auto-complete through the tab key, just like any other Linux command. This means you can type something like "xe vm-param-list", hit tab and the required "uuid=" parameter gets filled in, and then pressing tab again lists all the available VM UUIDS to pick from.

There is almost a 100% mapping between the functionality in XenCenter and the xe tool, although the xe tool does expose some additional capabilities, and is also required for things like configuring MPP-RDAC based storage. You can also use xe to make some advanced tweaks that are unsupported by Citrix, such as enabling jumbo frames for your network interfaces. I suppose the idea is that if you’re using the CLI, you know what you’re doing and don’t require hand-holding or protecting from your actions! Along with the auto-completion, this tool is backed up by good on-line help and the XenServer manual documents the "xe" way of doing things very thoroughly.

Speaking of networking, all network interfaces are by default managed by XenServer instead of the underlying CentOS system. You end up with a bridge created for each network card (xenbr0, xenbr1, and so on…) and although you can label them as being for management use (e.g. iSCSI traffic), it is still possible to add them to a VM. Using xe, you can set a NIC (apart from the management and VM data networks) to be "unmanaged", at which point XenServer forgets all about it and you can use the usual /etc/sysconfig/network-scripts/ifcfg-ethX to manage them. This may have advantages if you find the additional overhead of a bridged configuration too much, or require more fine-grained control over your network.

In summary, the XenCenter GUI management tool is pretty solid, does a good job and has all the tools you’ll need laid out in a logical fashion. In fact, it should be possible to use the GUI tool exclusively for most tasks, which is a great help for administrators who perhaps would prefer to steer clear of a Linux bash prompt. There is the xe tool there for those who prefer the CLI approach, or who want to perform advanced configuration or tuning tasks. Other than that, there’s not much else to say – both tools are reliable and have never presented me with any problems.

Storage and Pools

When you install XenServer, it takes up around 4Gb of disk space, and the rest is assigned to a local Storage Repository (SR), which is essentially a LVM volume group. If you happen to have multiple devices or LUNs detected during installation, you’ll get to choose which ones you want to use. Once installation has completed, you have several options for adding storage. The XenCenter GUI supports adding storage from the following sources :

  •  NFS
  •  Software iSCSI (using the OpeniSCSI stack in CentOS)
  •  Hardware HBA – This allows you to connect to Fibre Channel, FCoE, SAS or other SAN, assuming that your HBA is on the HCL supported list.
    In addition, you can connect to a read-only CIFS or NFS "ISO library", where you can store installation CD images, rescue disks etc. If you upgrade to one of the premium versions of XenServer, you can also make use of StorageLink on supported arrays, which pushes operations such as provisioning and snapshots to the array controller.

    Of course, you can also present storage to the CentOS-based Dom0, and as long as you can see a block device, you can create a LVM based repository on it using the xe tool, using something similar to the following :
xe sr-create type=lvm content-type=user device-config:device=/dev/cciss/c0d1 \
 name-label=”LOCAL SR”

If you enable multipathing in XenCenter, it will tune various OpeniSCSI parameters to enable faster failback, and set up dm-multipath. One thing that threw me initially is that /etc/multipath.conf is actually a symlink, instead of an actual configuration file. If you need to make any modifications, you need to change /etc/multipath-enabled.conf instead. Of course, if you are using an alternative multipath implementation such as MPP-RDAC, you will have to configure and manage this manually and XenCenter will not report it’s status.

If you have multiple XenServer hosts, you can join them together into a named resource pool, and then you can take full advantage of shared storage as live migration of VMs between hosts is then possible. If you join hosts to pools through the GUI, you’ll find that only heterogeneous systems can be joined: you can’t mix and match different families of CPUs, for instance. However, if you join a pool using the xe tool, you can pass the "force=true" parameter which will permit this. Obviously, you then need to be very careful, and live migration between hosts may well result in system crashes. It does however open up the possibility of running "cold migrations" (e.g. shutting down and then starting on a different system) of VMs.

When XenServer hosts are in a pool, one server is the pool master. All commands to the other hosts go through this master. If it’s not available, then you will not be able to start, stop or manage VMs running on the other servers. If the master is just down for a short time (rebooting etc.) then this is not a big issue. However, if it has properly died, you will need to promote one of the slaves to a master in the meantime, which is covered in the manual but basically boils down to picking a suitable slave and running the following commands on it :

xe pool-emergency-transition-to-master
xe pool-recover-slaves 

The rest of the slaves will now be pointing to the new master. You can then re-install the old master and add it back to the pool as a slave. One thing that did catch me unawares (it’s mentioned in the manual, but is such a big "gotcha", I feel it’s worth repeating here) is that if you remove a host from a pool, it will be reset back to factory conditions. This includes wiping any local SRs on it. This means you need to move any VMs running on local SRs to something else prior to removing the host from the pool, or you will lose your data!

All SR types (apart from NFS) appear to use LVM as the underlying volume manager, but you don’t get much control over them. You can’t control block size, PE start or much else. I have also been unable to determine exactly how access to the volumes is arbitrated between hosts in a pool, as there appears to be no LVM host tags or cluster manager (such as CLVMd) running. However, you are prevented from attaching a SR that is in use by a pool to a non-pooled XenServer, and I have yet to experience any problems. The NFS SR uses flat ".vhd" files as virtual disks, and despite the lack of "true" multipathing, can make an cheap and effective shared storage solution, particularly when combined with interface bonding which is supported natively through XenCenter.

Once a VM is in the shut-down state, it’s very easy to move it’s underlying storage to a different repository – you just right click on it in the GUI, select "Move VM", and then select the target repository. Assuming you don’t have an exceedingly large volume of data to move, this makes a tiered approach to VMs possible and easy. If you have cheap NFS storage for non-critical VMs, iSCSI for more important ones, and even a top-tier of FC/SAS/FCoE, you can move VMs between SRs as performance or reliability requirements change.

I found that the usual trick of using "kpartx" to access virtual disks on LVM volumes doesn’t seem to work (as there is additional VHD metadata before that start of the actual disk image), although there’s an easy work around for PV Linux hosts on ext3. You can run "xe-edit-bootloader" to modify the grub configuration for the VM, which plugs the required Virtual Block Device (VBD), and mounts the VM’s root filesystem. If you are using "vi" as your editor, you can then hit Ctrl-Z and be left at a prompt where you can change into the mounted filesystem and run any maintenance as needed. Alternatively, you can create a "Rescue VM" for such tasks. I have one ready configured which boots from a System Rescue CD iso, and has a 100Gb empty filesystem for copying temporary files to. I can then simply attach another VM’s disk to this in the GUI, boot and mount it from within the rescue environment.

Snapshots are also not LVM-based snapshots, but utilise an altogether more complex (but more versatile) method. Full details of how this works are available in the Citrix knowledge bases – http://support.citrix.com/article/CTX122978. The linked PDF in that knowledge base entry is well worth a read, as it explains a lot about how XenServer uses virtual storage.

Essentially, when you create a snapshot on an iSCSI or LVM based SR, the original disk image gets truncated to it’s allocated size (e.g. a 10GB image with 5Gb used would get truncated to 5Gb), and gets re-labelled so it becomes the snapshot image. A new image then gets created to hold all future writes, and an additional image gets created to hold any writes to the snapshot. This is why, when you view your storage you may see snapshots showing up as "0% on disk" – they haven’t been written to, so are consuming no extra space. However, if you have a 100Gb disk image with 40Gb used, when you create a snapshot you will end up using 140Gb. Even if you delete all snapshots for that VM, you will still use 140Gb of disk space.

This is because XenServer cannot use "thin provisioning" on block devices (NFS stores, or local "ext" SRs do not have these limitations), as it does not use a clustered filesystem unlike VMWare’s VMFS. Citrix do provide a "coalesce tool", which will re-combine the snapshots into one VDI again and free up used space. This tool is documented in another Citrix knowledge base article here : http://support.citrix.com/article/CTX123400, and I have heard from a Citrix support engineer that an online coalesce tool will be provided in the next update of XenServer due later this year, so VHDs can be re-combined without powering off or suspending them. It’s worth bearing these requirments in mind if you plan on using regular snapshots for your backup strategy.

Migrating existing systems

Fortunately, the vast majority of my VMs are paravirtualized Linux systems already running on open-source Xen. These are exceptionally easy to convert to XenServer – I just created a VM using the "Demo Linux" template, which sets up a Debian Etch system. After this has been created, I shut it down and use the "xe-edit-bootloader" trick to mount the filesystem. I can then run a "rm -rf" on it, and then copy the new VM’s root filesystem over to it – using something like rsync or a tar archive (If using tar, remember to use —numeric-owner when unpacking!). I then make a quick edit of /boot/grub/menu.lst and /etc/fstab to point to the new block devices (/dev/xvda1 etc.) then change out of the mounted directory and quit the editor session. When the VM next starts, it’ll be running your new image. I’ve tried this approach on CentOS 5 and Debian Etch/Lenny/Squeeze hosts and all worked perfectly. Debian Squeeze doesn’t even need a -xen kernel, as pv_ops DomU is now in the mainline kernel; you just need the "-bigmem" kernel for PAE support.

There is also a script on the Xen.org website that appears to do this automatically : http://www.xen.org/products/cloud_projects.html. I haven’t tried this yet, but it looks like it’s worth checking out if you have a lot of open-source Xen VMs to migrate.

You will need to install the Xen tools in your VMs for optimum performance and reporting capabilities. I found though that  the installation script tries to replace your kernel with a Citrix one. If this is not acceptable in your environment (I prefer to stick with the kernel supplied by the distro), you can just install the xe-guest-utilities packages, which are provided as 32 and 64-bit RPMs or DEBs.

For the Windows systems running on ESX that needed migrating, we discovered that XenConvert managed them all. You just need to remove the VMware tools before starting the conversion process, as well as being prepared for a long wait. So far, Windows server 2000, 2003 and 2008 have all been converted with success.

Backup strategies

There are a number of backup options available to you. In addition to running host-based backups (using something like Bacula, Amanda, NetBackup, Legato Networker etc.), you can also backup your VM images. You could use something based on the script I wrote to automate this, assuming you have a centralised backup location with enough space. If this is a NFS SR, you can include the flat file contents of this in your regular backups. You can also run backup commands from within XenCenter, which will save a VM image or produce a "VM appliance" bundle to your local PC. Whilst not a good approach for regular backups, this can be useful for quick ad-hoc backups of systems. Finally, you can also backup VM metadata from the text-mode console, which will create a special volume on a given SR which holds all the VM configuration data. This means that you can re-attach this SR at a later date, and recover all your machine images and configuration.

There is also a commercial tool that integrates with XenCenter available from PHD Virtual, although I haven’t investigated it.

Licensing

If you are using the free edition, all you need to do is re-register each of your systems once a year with Citrix in order to keep running; apparently, this is so that they can accurately gauge interest and allocate resources as needed. If you fail to do this, your VMs will keep running, but you will not be able to make any configuration changes or start any new VMs. I’ve found that the registration process is very simple and can all be done with a few clicks through XenCenter. You can view existing licenses and expiry dates, request new licenses and assign premium licenses (more on that in a moment) all through the same license manager. So far, every time I have requested a new license, it has been processed and emailed to me within a couple of hours so as long as you leave yourself enough time to renew them each year, shouldn’t become a burden.

If you do end up purchasing one of the advanced editions, these include a perpetual license so you do not need to continually renew your systems. This requires the use of a Citrix licensing server, which is provided either as an application which runs under Windows, or as a Linux-based appliance VM which you can run on XenServer itself. So that there isn’t a "chicken and egg" situation, there is a 30-day grace period where XenServers will still continue to run with all the advanced features without the licensing server being available. After this, they will revert back to the basic edition so you will still be able to run your VMs.

The Future and Conclusion

With any big investment (time as well as money), it’s always prudent to consider the future, all the more so when it comes to something which will underpin your whole IT infrastructure. There have been some concerns raised as to the future of XenServer, but my personal take is that it’s just the usual "the sky is falling" rumour-mongering so sadly common on the Internet. Citrix seem to be doing well as a company, and XenServer has enjoyed a solid heritage, all the way from back when it was called "XenEnterprise" before Citrix bought it. With the release of 5.6 earlier this year, it would appear that Citrix are putting a lot of effort behind their product, and there have been a number of big client wins although it doesn’t yet have the market share of VMware. However, a recent report suggests it’s growing market share faster than any of it’s competitors.

But what if the worst did happen though? If XenServer development and support both halted and no permanent licenses were available (highly unlikely, but then a few years back I’d never have thought that Oracle would have bought Sun and killed OpenSolaris), you do have several options open to you. You could always go back to open source Xen on the distribution of your choice or migrate to the open source Xen Cloud Platform, which is pretty much XenServer minus the XenCenter GUI and support options. If you use OpenXenManager, this could be a near drop-in replacement.

You can also export your VMs as an OVF image, which you could then import on multiple platforms, including VirtualBox and VMware. In short, I was happy that I had enough options for an exit strategy if needed – of course, this was sufficient for my needs, but I’d recommend you do some research and experimentation of your own.

And so, to my conclusion : XenServer makes a logical upgrade if you are already running on an open source Xen system such as Red Hat or SuSE Linux, and represents a very low risk choice if you are just starting out with virtualization or looking to migrate from an old, non-clustered VMware environment. It represents fantastic value for money, as with the free version you get an full-featured system with the all important live migration enabled, so you can move VMs between physical hosts with zero downtime. If you spend a little more, the premium editions give you far more "bang for your buck" than the comparative VMware offerings. While it may lack a few high-end features (and the lack of granular control over storage parameters is frustrating), it will likely fulfil the requirements of many environments and it won’t cost you anything to try it out!

Xenserver Snapshot and Template Based Backup Script

| Comments

We have recently started using Citrix Xenserver in production at work (fantastic product, see my review for more information) and needed a simple backup solution. Our VMs run from an iSCSI SAN and are backed up daily through various methods – e.g. Bacula for the Unix/Linux systems. However, we wanted the ability to quickly roll back to a previous VM snapshot, and get up and running quickly if our SAN failed for whatever reason. Our solution was to create a large shared NFS storage repository, and periodically snapshot VMs and copy the templates over to this SR. Doing this means that if the SAN fails, we can create a new VM quickly from this NFS store (using the Xenserver’s local disks, or even the NFS SR itself as storage). Once up and running, we can bring VMs back up to date by restoring the latest backup to them.

In order to automate this, I wrote a quick script which I thought may prove useful to someone else, so decided to post it here : snapback.sh.

Update: This script is now being hosted at GitHub. This means you can check out the latest version from there, by doing :

git clone https://github.com/markround/XenServer-snapshot-backup.git

or accessing the raw script file at https://github.com/markround/XenServer-snapshot-backup/raw/master/snapback.sh

It is very simple, and although it may serve well as your only backup solution, it’s really intended as an image-level compliment to your primary file-system based backup system such as Bacula, Amanda, Netbackup etc. It also has not had much testing, and I fully appreciate the scripting is pretty rudimentary and could do with some optimisation – there’s no error checking, for instance. I kept it pretty verbose on purpose though, so you can get a good idea of exactly what it’s doing at each step; it may be better to think of this as a template you can base your own scripts off!

Overview

The script creates a snapshot of a running VM on a configurable schedule, and then creates a template from this snapshot. It will copy all these backup templates over to a configurable storage repository, and then clean up any old backups according to a specified retention policy. These backups are full backups, so if you have a 10GB VM and keep 7 previous copies you will need a total of 80GB disk space on your backup VM. Non-running VMs, and those not configured (as detailed below) will be skipped.

Important: See http://support.citrix.com/article/CTX123400. After backing up each VM, you will end up with a new VDI, so you may need to manually coalesce your VDIs again to reclaim disk space.

Installation and usage

First, copy the script to your Xenserver pool master, and make it executable. A good location for this is /usr/local/bin/snapback.sh.

Next, create a cron entry for the script – to make it run daily just after 1AM, you’d create /etc/cron.d/backup with the following contents :

2 1 * * * root /usr/local/bin/snapback.sh > /var/log/snapback.log 2>&1

This will also record a log of it’s actions to /var/log/snapback.log. You now need to edit the script and change the DEST_SR variable to the UUID of your backup storage repository. You can find this value by clicking on the SR in Xencenter; the UUID will be displayed as a value like "2c01dc26-f525-70d6-dedf-00baaec76645".

Lastly, you need to configure your backup and retention policy for your VMs. In Xencenter, right click your VM, and select "Properties". Click on "Custom Fields", and then "Edit Custom Fields". You should add two text fields :

  • backup : Can be one of "daily", "weekly", or "monthly". If it is set to weekly, it will by default run on a Sunday, and if it set to monthly, it will run on the first Sunday of the month. This day can be changed at the top of the script – see the WEEKLY_ON and MONTHLY_ON variables.
  • retain : How many previous backups (in addition to the currently running backup) to keep. So, setting this to a value of "2" would mean that after a backup has run, you would end up with 3 backups in total.
Adding a custom field
Adding a custom field

The script will look for these fields when it is run, and will skip any VM that doesn’t have them set. You can also see them in the Xencenter summary and properties for the VM :

VM summary showing the custom fields
VM summary showing the custom fields

You can now either run the script manually, or wait until the cron job kicks off. It will produce a detailed log to the console (or log file if run through cron), and when it’s finished, you’ll see your template backup VMs listed in Xencenter, similar to this :

Backups listed in Xencenter
Backups listed in Xencenter

If you find that this clutters up the Xencenter view a little, you can always hide them (View->Server View->Custom Templates). To restore a VM from a backup, just right click, and choose "New template from backup". Anyway, I hope this helps someone else!

The Setting Sun

| Comments

Well, that’s that, then. Solaris as we knew it is pretty much dead. I’ve suspected for a while now that Oracle’s intentions regarding Solaris were not what the community, or us "old-school" Solaris sysadmins wanted or had hoped for.

In the last few months, Oracle have completely alienated and scared off the community around OpenSolaris, killed any lines of communication by clamping down on employee blogs and have ignored open letters from highly influential and important community members begging for any kind of information. They’ve forbidden Sun/Oracle employees from heading up the Solaris user groups and booted the meetings out of their buildings; turned Solaris 10 into a 90-day trial, and pushed back the 2010.x release of OpenSolaris with no word as to it’s planned release date, or even if it is being continued as a product. And now, in a final act of desperation, the OGB has essentially threatened to "shoot itself in the head".

Even ignoring the OpenSolaris project, It’s not at all rosy in "real" Solaris land, either. Requests for information and clarification are going unanswered, and I know of several managers who have had hardware quotes and support tickets ignored - there’s a near total blackout of information from Oracle. People are fleeing Solaris in droves, and migrating to anything they can: Linux, FreeBSD (Dtrace and ZFS), AIX - hell, even HP-UX looks like a safer bet at the moment. And I never thought I’d find myself saying that!
It certainly appears that Oracle are doing a superb job of killing Solaris. But why would they do this, having paid all that money for Sun and announcing that they will increase spending on Solaris development ? 
Well, this post on Slashdot (allegedly from a Sun/Oracle employee) confirms my suspicions as to why they may be doing this. Oracle just really doesn’t care about Solaris as a general purpose data centre OS any more. There’s just no money in it, and although I personally find it tragic it does make sense. It’s probably also why they’re killing all their OEM deals - why help a competitor sell hardware, when all you’ll see from it is a possible support contract for the OS ? Oracle’s overall aim is to have Solaris relegated to the role of running as the bottom layer in an Oracle "database machine", Java appserver bundle or inside a "Fishworks" storage appliance. It excels at these tasks, and it would obviously fit into Oracle’s stated goal of being a one stop shop, where if you want to run Oracle, they’ll sell you the bundle - hardware, storage, OS and software. If they no longer want Solaris to be a dominant general purpose OS, then their approach makes sense. They don’t need a "community" around the product, they don’t need open source developers porting applications to it, and they certainly don’t need the overhead of running and managing a community portal any more. Unless you are running (and paying for) Oracle applications on Solaris, you’re probably more of an annoyance to them at the moment and I get the very strong idea that they’d rather you just quietly went elsewhere. I just wish that if this was their plan, they’d make some sort of statement about it; rather than ignoring the Solaris community in the hopes that they’ll eventually get frustrated and leave without Oracle having to spell it out for them. I think the way they are going about it reprehensible and it’s a tragic end for such a historic and innovative OS. Sadly though, Larry is all about the bottom line and the old, altruistic Sun approach just wasn’t bringing in the big bucks. As the Slashdot poster said : "Profit is king here. Anything else is overhead, and overhead eats into Larry’s yacht fund." Edit: Now it’s official : http://www.theregister.co.uk/2010/08/13/opensolaris_is_dead/

Centreon Review

| Comments

One of my favourite interview questions I used to ask candidates was a variation of "Desert Island Discs" : Imagine you are going off to be a sysadmin on a desert island, with no internet access, and further imagine that the previous sysadmin was a total fascist with a minimalist install policy. We’re talking a bare-bones "classic" Solaris installation, or a minimal Debian system here. You’ve got SSH installed, but not much else. Before you hop on the boat, however, you are given a couple of hours high-speed internet access and a USB stick. You can take up to 5 tools with you to this desert island: What do you pick ?

It was always an interesting question to ask, because it gave you an insight into the kind of sysadmin tasks someone had been doing before, and it also served as a nice, relaxed "ice breaker" type question. For my money, aside from some tools like rsync and screen which I couldn’t live without, a decent monitoring package would have to be top of my priorities. There are a bunch out there: some of them free; some of them commercial, but the one that would make it on to my USB stick would have to be Nagios.

It’s open source, extremely well documented and widely implemented, and there are a ton of useful add-ons and plugins available for it. The only draw backs I can find with it are it’s ugly web interface, the complexity involved in setting up a new system for monitoring, and the disjoint between availability and performance monitoring. If you have money to throw at a problem, then software like Uptime or Hyperic neatly deal with all of these issues, but they can be quite pricey if you have a large number of systems to manage and a tight budget.

So, you can imagine my excitement when I first discovered Centreon. It’s essentially a monitoring platform that uses Nagios at it’s core. You could think of it as a fancy frontend to "stock" Nagios, but it’s so much more than that: besides the attractive interface, it also bridges the gap between availability and performance monitoring, and makes Nagios administration a snap. Due to the reliance on Nagios though, I’d go so far as to say that before you experiment with Centreon, you really should have set up "stock" Nagios, and be familiar with the plugin architecture, NRPE and how alerts / escalations are managed. Ideally, you should have a stock Nagios installation you can use to duplicate on Centreon/Nagios.


Installation is a bit of a mission, however. My original approach was to try and use the Debian-provided packages of Nagios, NDO and other prerequisites, and then install Centreon on top of that. After several failed attempts of tweaking paths and settings, I gave up. It was just far easier to follow the standard instructions and install everything from source on a stock system. Much of the "official" documentation is in French though, so it may be that there are some better instructions for that sort of thing for our friends on the continent. It’s also fairly loosely organised on the Wiki, so be prepared to put some time aside beforehand to browse and collate all the documentation you’ll need.

My test system was a Debian Lenny 32-bit system, running as a Xen DomU with Apache 2.2, PHP 5, and MySQL 5.0. I kept the install on a VM as I found that it’s best to keep the Centreon/Nagios system separate from any other monitoring applications you may want to run, particularly as it installs and manages it’s own PEAR modules etc. Also, due to the somewhat involved installation process, you may find you need a couple of attempts to get everything just right. This is where being able to snapshot and rollback a VM is invaluable! One suggestion for the Centreon developers: How about providing a pre-configured VM appliance download ? It would drastically lower the barrier for people interested in just trying the application if they could simply import a disk file, click and go.

Despite the long-winded installation procedure, I found that subsequent upgrades to be smooth and trouble free. During the time I have been running Centreon, there have been 5 or 6 "point" releases, and one major jump from 2.0.x to 2.1.x. Each of these passed without any incident – a simple upgrade install from the terminal (pointing the installer at your configuration files), finished off with a web-based database upgrade wizard. These wizards, and indeed, the rest of the interface are all extremely well designed and easy to follow; Don’t let the "quirky" English on the main site and wiki put you off if, like me, your French doesn’t go much beyond "Bonjour!" and "Je m’appelle…"

When you first log in, you’ll see a dashboard view that looks similar to this :



This provides an overview of your network, and reports on any issues found. A smaller, less-detailed version of this information is also always present at the top right of the screen. The visual improvements over the standard Nagios interface should already be readily apparent. As you’ll see throughout the rest of the snapshots, the clean lines and functional design are carried throughout the whole interface: Just as a further example, here’s the Nagios host detail screen :

Nagios host detail screen

And here’s the Centreon page showing the detail for the same host :

Centreon host detail


As well as illustrating the aesthetic and functional improvements in the Centreon interface, it also highlights the fact that you can continue to use the Nagios interface and tools alongside it.

The first thing you’ll want to do is to define your hosts and services. Although you should be able to import an existing configuration, you’re almost certainly better off generating a new configuration from scratch, and far less likely to run into problems. If you’re familiar with configuring Nagios, this is where you’ll start to see the immediate benefits. Normally, you’d have to configure your various check commands by editing a config file, which usually involves (at lest for me) an editor, a couple of terminal windows open and the output from the check command’s help file to work out what all the switches do. Now, all this is managed by a neat GUI :

Centreon command definitions

And

here’s the editor dialog that assists you in creating your command definition – note the popup with the check command output displayed!

Command editing


After

setting up your escalation groups, contacts and templates, you can then easily create new hosts, and assign them to groups, pull in templates, and tweak settings through the GUI:

Host configuration

An

interesting item here is the "Monitored from" drop-down; Centreon has extensive support for distributed monitoring, which allows you to utilise a central monitoring server, and satellite nodes at different sites.

Where the web interface shines is when you are setting up relationships, or assigning contacts to a host. Instead of manually editing the configuration files (and then kicking yourself when Nagios won’t restart because you’ve referenced a non-existant service or host in a group somewhere), you can simply assign or remove items using a familiar list tool in the GUI :



Of course, as it’s all ultimately using the text-based configuration files, you can always use your existing scripting or configuration management infrastructure to manage hosts, services and relationships as needed. After you have made your configuration changes, they are not immediately picked up by Nagios; you need to export the generated configuration which is done by navigating to the Configuration –> Nagios screen. By default, this will only test the configuration; once you’re happy with it, you can tick the "Move Export Files" and "Restart Nagios" boxes to make the generated configuration live and restart the Nagios process.

Restarting Nagios


Once

everything is defined and being monitored, you will notice that quite a lot of information is available in a graphical form, right out of the box. For example, the home tab provides a link to a Nagios Statistics page which displays some graphs showing the performance of the Nagios engine, hosts monitored and other details :

Nagios performance

If

you go to the Monitoring->Services screen, you’ll find that some of the pre-defined services have a graph icon next to them. Hovering over them reveals the data plotted graphically – for example, load average:


Another nice touch is the popup service detail, which is displayed when you hover the mouse over the service name :



There’s also extensive reporting available :

Centreon reporting

Centreon reporting

All

in all, I am extremely impressed with Centreon. There are plenty of monitoring tools out there that compete with it (ZenOSS, Zabbix), but nothing that comes close if you already have an investment in Nagios.