
You are connected using IPv4. The logo above will turn green when you connect using IPv6.
Ask your ISP about IPv6 connectivity, and check your status here.
Archives
Quicksearch
Syndicate This Blog
Building a redundant iSCSI and NFS cluster with Debian - Part 3
This is part 3 of a series on building a redundant iSCSI and NFS SAN with Debian.
Part 1 - Overview, network layout and DRBD installation
Part 2 - DRBD and LVM
Part 3 - Heartbeat and automated failover
Part 4 - iSCSI and IP failover
Part 5 - Multipathing and client configuration
Part 6 - Anything left over!
Introduction
In the last two guides, we set up a DRBD resource and LVM volume group which we could manually migrate between the two cluster nodes. In this guide, we'll set up the Heartbeat cluster software to handle automatic migration of services between the two nodes in our cluster ("failover").
The version of Heartbeat included in Debian Etch is 1.x. It is a very simple system, and is limited to two node clusters, making it ideal for something simple such as failover for services between two nodes. The current 2.x branch is a lot more complicated, and has a new XML configuration format, although it can still be used with the original 1.x format files. Although it adds many useful features, it's overkill for our needs at the moment - plus, sticking to 1.x avoids the need to install software not included in the current stable distribution.
Linux, Solaris and FreeBSD iostat monitoring with Cacti
I've been looking for ages for a tool to parse the output from "iostat" on Linux, and graph it in Cacti. I found a few scripts and templates that did some of what I was looking for (disk I/O etc.), but nothing that gave me the full set of statistics such as queue length, utilisation, service time etc. I finally got round to writing my own set of templates and a data gathering script to provide this information, and it seems to work very well. So that others can benefit, I've posted the package archive and a brief description over on the Cacti forums (click Continue Reading for a download link to an updated version - the one on the Cacti forums has a bug so that it won't work with all versions of sysstat). Below are a couple of sample graphs to give you an idea of what it can do - there's also a few more samples posted in the Cacti forums thread :


Installation is a simple matter of creating a cron job to gather iostat data, extending your snmpd.conf to call the included iostat.pl script, and then importing the templates. Full instructions are included in the README within the archive (click the Continue Reading link to see them), but if you have any comments, suggestions or problems please let me know!
Continue reading "Linux, Solaris and FreeBSD iostat monitoring with Cacti"
Blastwave is dead
Blastwave is a registered trademark of Blastwave.org Inc. in the
United States and Canada. All assets of Blastwave.org Inc. are frozen
until further notice. All Solaris(tm) related open source software
work and services are cancelled. All websites, documents and binary
software packages that bear the mark Blastwave or Blastwave(tm) are no
longer available until further notice.
At the same time, mailing lists, shell logins and other services seem to have been shutdown and/or removed from DNS. None of this came with any warning or notification to the maintainers, and I still don't know what's going on. I can't access any of the build servers, so it's fairly safe to assume that my build scripts, packages, documentation, and everything else I've been working on for the Solaris community over the last 5 years is gone also. As if that wasn't enough, there are also reports that someone has been attempting to sabotage various mirror sites. I don't know how to take that - but frankly, right now, I don't care. I'm out. I've had it with the political fighting and drama. Many maintainers had already left following the last spat - I simply don't have the will to get involved in it any more, the damage has already been done. If anyone is still using my Blastwave packages (PostgreSQL, Nessus, PHP4, and some others) I recommend you switch to something else, like Sun's own CoolStack or OpenSolaris.
There's plenty more I could say, but at this point I think it's perhaps better to simply leave it. It's a sad day for me: seeing years of work towards something that I believed in, and helped a great many people, all go to ruin. It's even sadder for the Solaris community as a whole; this was a true grass-roots organisation - made up from like-minded Solaris users, admins, programmers and fans - who gave up countless hours of their own time to help others. I think the least we deserve is an explanation, but somehow I don't think one at this stage would make any difference anyway.
Update : People have been mailing me to say the main page is back up - true, but it's a case of "the lights are on, but no one's home". Check the thread in comp.unix.solaris.
Building a redundant iSCSI and NFS cluster with Debian - Part 2
This is part 2 of a series on building a redundant iSCSI and NFS SAN with Debian.
Part 1 - Overview, network layout and DRBD installation
Part 2 - DRBD and LVM
Part 3 - Heartbeat and automated failover
Part 4 - iSCSI and IP failover
Part 5 - Multipathing and client configuration
Part 6 - Anything left over!
Configuring DRBD
Following on from part one, where we covered the basic architecture and got DRBD installed, we'll proceed to configuring and then initialising the shared storage across both nodes. The configuration file for DRBD (/etc/drbd.conf) is very simple, and is the same on both hosts. The full configuration file is below - you can copy and paste this in; I'll go through each line afterwards and explain what it all means. Many of these sections and commands can be fine tuned - see the man pages on drbd.conf and drbdsetup for more details.
global {
}
resource r0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
startup {
wfc-timeout 0;
}
disk {
on-io-error detach;
}
net {
on-disconnect reconnect;
}
syncer {
rate 30M;
}
on weasel {
device /dev/drbd0;
disk /dev/md3;
address 10.0.0.2:7788;
meta-disk internal;
}
on otter {
device /dev/drbd0;
disk /dev/md3;
address 10.0.0.1:7788;
meta-disk internal;
}
}
The structure of this file should be pretty obvious - sections are surrounded by curly braces, and there are two main sections - a global one, in which nothing is defined, and a resource section, where a shared resource named "r0" is defined.
The global section only has a few options available to it - see the DRBD website for more information; though it's pretty safe to say you can ignore this part of the configuration file when you're getting started.
Continue reading "Building a redundant iSCSI and NFS cluster with Debian - Part 2"Building a redundant iSCSI and NFS cluster with Debian - Part 1
It's been a while now since I last updated this blog with any decent material (The Poo Truck notwithstanding, as honestly, that's a classic) so I thought I'd dust off some of my notes on building a redundant iSCSI and NFS SAN using Debian Etch.
The following post takes the form of a "HOWTO" guide - I'll include all the relevant commands, configuration files and output produced so you can follow along. This is the first part of the series; I'll post the different sections in phases, each covering a different part of the setup. The plan is to cover all this in 5 (possibly 6) separate posts, with the following content :
Part 1 - Overview, network layout and DRBD installation
Part 2 - DRBD and LVM
Part 3 - Heartbeat and automated failover
Part 4 - iSCSI and IP failover
Part 5 - Multipathing and client configuration
Part 6 - Anything left over!
So, this being part one, I'll start with a quick overview of what I'm trying to achieve here :
Cluster overview
The cluster will consist of 2 storage servers, providing iSCSI and NFS services to a number of clients, over floating IP addresses and from a replicated pool of storage. This storage will be used for file sharing (NFS), and block devices (iSCSI) - although you could add any kind of service on top of the cluster; an obvious option would be to provide SMB (Microsoft Windows file services), although I won't explore that particular avenue.
This will be replicated with DRBD, and managed using LVM2. I'll also be using multipathing to the storage, so that a component (NIC, switch, cable etc.) can fail in one channel but the storage will still be accessible. Failover and cluster management will be provided by the Linux-HA project.
The distribution I'm using is Debian Etch (4.0), although most of the configuration files and commands used will work on any distro, although file locations and the package management commands will obviously differ.
Network layout
The two storage nodes (which I'll call "otter" and "weasel") will have the following 4 network interfaces configured :
- eth0: 172.16.1.x -Management interface (the address we SSH into to manage the system)
- eth1: 10.0.0.x - This is for data replication and heartbeat between the two nodes, and will be via a cross-over cable connected directly between the two servers
- eth2 and eth3: 192.168.1.x - This is the storage network, clients will connect to this for their storage.
And the client (which I'll call "badger") will have the following 3 network interfaces configured :
- eth0 : 172.16.1.x - Management / public interface
- eth1 and eth2 : 192.168.x.1 - Storage network (where we access the iSCSI and NFS storage). These will use 192.168.1.1 and 192.168.2.1, both with a netmask of 255.255.255.0 to ensure that requests go to the correct interface when using multipathing (more on that later).
In a real-world scenario, these would be on physically different NICs, and would also be on separate switches - particularly the multipathed storage interfaces. Utilising the different private ranges makes it easier to see at a glance what is going on, and makes trouble-shooting a lot easier. It's also obviously a good idea to separate your storage network from the rest of your regular network traffic.
Of course, there is nothing stopping you from utilising virtual NICs and having each address on eth0:1, eth0:2 and so on. Obviously, GigE or higher would be required in a production network, but there's nothing stopping you from using 100Mb in a test/development environment. Just don't expect stellar performance!
There will also be a null-modem cable connected between the two serial ports on each storage node. This is to supplement the network heartbeat, and will help avoid the problem of "split-brain" that can occur in clusters. If there was a problem with the heartbeat network - the switch failing, for instance - both nodes would then see the other as failed, and try to assume the master role. Having a secondary heartbeat connection between the nodes will help avoid this problem - particularly as it is a "straight-through" connection, and does not rely on any intermediate devices such as a network switch.
At this point, a diagram might be in order - you'll have to excuse my "Dia" skills, which are somewhat lacking!

This diagram shows all the important connections between the various hosts, so hopefully this will make things a little clearer. You can see that with this architecture, we could loose one storage server or one switch, and we'd still have a valid path to the storage from our client.
Continue reading "Building a redundant iSCSI and NFS cluster with Debian - Part 1"Poo

no comments yet, be the first! Trackbacks (0)
ZFS Replication
As I've been investigating ZFS for use on production systems, I've been making a great deal of notes, and jotting down little "cookbook recipies" for various tasks. One of the coolest systems I've created recently utilised the zfs send & receive commands, along with incremental snapshots to create a replicated ZFS environment across two different systems. True, all this is present in the zfs manual page, but sometimes a quick demonstration makes things easier to understand and follow.
While this isn't true filesystem replication (you'd have to look at something like StorageTek AVS for that) it does provide periodic snapshots and incremental updates; these can be run every minute if you're driving this from cron - or, at even more granular intervals if you write your own daemon. Nonetheless, this suffices for disaster recovery and redundancy if you don't need up-to-the second replication between systems.
I've typed up my notes in blog format so you can follow along with this example yourself, all you'll need is a Solaris system running ZFS. Read more for the full demonstration...
ZFS as a volume manager
The example used in this post is the creation of a mirrored zpool which is then used to create a block device, on top of which I'll create a UFS filesystem. The reasons for doing this are many and varied : you may have an application that needs UFS (particularly forcedirectio); you may need to create a block device for some reason but all your storage is currently tied up in zpools; or you just need a quick block device to use for testing.
Using ZFS as a volume manager also has it's advantages over something like SVM (formerly "DiskSuite"). The management features are much improved (along with a browser-based GUI, if that's your thing) and you also gain access to ZFS features which operate at the volume manager layer and aren't dependant on the filesystem parts of ZFS. This includes features such as end-to-end error checking and recovery, along with snapshots.
Read on for the full update...
Continue reading "ZFS as a volume manager"ZFS and caching for performance
To give a little background : I had been experiencing really bad throughput on our 3510-based SAN, which lead me to run some basic performance tests and tuning. During the course of this (and resolving some of the issues), I decided to throw ZFS into the mix. The hosts involved are X4100s, 12Gb RAM, 2x dual core 2.6Ghz opterons and Solaris 10 11/06. They are each connected to a 3510FC dual-controller array via a dual-port HBA and 2 Brocade SW200e switches, using MxPIO. All fabric is at 2Gb/s.
So far, pretty straightforward. I had been using iozone as my benchmarking tool (using a 512Mb file as that's the average table size for our databases), and compared a wide range of systems and configurations, from an Ultra 20 with 7200RPM SATA drives, to the X4100's internal 10K RPM SAS disks as well as LUNs made available from the SAN in a variety of RAID levels.
Some interesting results here, which I'll skip over for the moment (like the Ultra20 beating the X4100 and SAN in read performance!) - the kicker happens when I added ZFS into the mix as an experiment.
Continue reading "ZFS and caching for performance"Digital Badger
no comments yet, be the first! Trackbacks (0)
Apache mod_proxy balancing with PHP sticky sessions
I've been investigating the new improved mod_proxy in Apache 2.2.x for use in our new production environment, and in particular the built-in load balancing support. It was always possible to build a load-balanced proxy server with Apache before, using some mod_rewrite voodoo, but having a whole set of directives that do all the hard work for you is a great feature.
There is however, a catch. It won't work out of the box with PHP sessions, or many other applications. I've since worked out a way around this which enables you to continue using all the great features mod_proxy_balancer offers and still bind requests to an originating server. All you need is a little mod_rewrite magic : Read on for more details...
Continue reading "Apache mod_proxy balancing with PHP sticky sessions"Sun V240 to X4100 : AMD vs SPARC
At work, we just migrated a database server from a Sun Fire V240 to a Sun X4100. This makes it the first AMD64 system we've put into production, and the performance advantage is staggering. I could post the benchmarks and various statistics, but I believe the following graphs paint a far more interesting and convincing argument for the price/performance benefit of Sun's AMD64 offerings...
Before (V240) CPU Utilisation

After (X4100) CPU Utilisation

All told, I'm impressed. The X4100 is ripping through queries at a phenomenal rate and is barely breaking a sweat. The V240 on the other hand was clearly struggling and was maxing out at 100% load. True, it's not a true like-for-like comparison, as it's pretty much impossible to do that across different systems and different architectures. But take a look at the price levels of these two systems - the V240 came in at around £7,500 for dual 1.5Ghz UltraSPARC IIIi processors, whereas for £4,800 you can get the X4100 with dual dual-core AMD 285 processors clocked at 2.6Ghz. Frankly, it's no contest. The only thing you don't get with the X4100 is another couple of disks which is no big deal as we've hooked it up to our SAN. However, even if you want to go for the X4200 which has room inside for 4 internal disks, you'd still only end up paying £5,100.
no comments yet, be the first! Trackbacks (0)
PHP 4.4.3 packages now in testing
I hope to get these packages released to unstable in the next few days - I've been running them for a few days here and there appears to be no issues, but as always any other testing or feedback is always appreciated. Make sure you head the warning at the top of the testing page, though!
no comments yet, be the first! Trackbacks (0)
LigHTTPd and Apache - Symfony benchmarks
At work, we're developing a brand new in-house CMS based on the Symfony framework. As it uses no mod_rewrite rules or other Apache dependencies and is a "clean break" for us, I figured it would be an ideal candidate for benchmarking under LigHTTPd, comparing it to Apache 2.2 in order to give me some statistics to compliment my last blog entry on the subject.
The results from the "ab" Apache-benchmark tool are pretty stunning - although I'm still at a loss as to explain just why LigHTTPd is so much faster. The configuration of everything apart from the webserver is identical. I'm running on a Sun Ultra 20 with 2Gb of RAM and Solaris 10 01/06. I have a shared document root, and two separately, identically configured zones, one running Apache 2.2.3 with prefork MPM, the other running LigHTTPd. PHP on both is 5.1.4, built using exactly the same compiler (Sun Studio 11) and flags for the Apache 2.2 SAPI and Fast-CGI build. Apache is using PHP loaded as a DSO, whilst LigHTTPd is running PHP through a socket, with 8 pre-forked PHP child processes :
fastcgi.server = (
".php" => ((
"socket" => "/tmp/php-fastcgi.socket",
"bin-path" => "/usr/local/php/bin/php",
"bin-environment" => (
"PHP_FCGI_CHILDREN" => "8",
"PHP_FCGI_MAX_REQUESTS" => "10000"
),
"bin-copy-environment" => (
"PATH", "SHELL", "USER"
),
"min-procs" => 1,
"max-procs" => 1,
))
)
The page in question is just the initial login page to the CMS. There's no database access at all, so no communication with any system external to the web server. It's just straight Symfony processing, using the current trunk.
Read on for the results...
Migrating from Apache to Lighttpd
In my role as a sysadmin, the bulk of the Unix systems I administer are web servers, running the now standard open-source stack of Apache, MySQL and PHP (note that whatever my personal misgivings may be about those elements, they are pretty much the standard now and what's been mandated at work). If you're using PHP on Unix, it's pretty much taken for granted that you'll be running it through Apache via mod_php. In fact, it almost goes without syaing that if you're doing any kind of webserving on Unix at all, you'll most likely be using Apache. It's a setup that has perfomed well in each instance I've deployed it - from small personal sites and development systems, to the large high-traffic sites I'm responsible for at work. It's free, robust, and above all well documented.
So why am I now seriously considering ditching Apache ? One word :LigHTTPd. I'd been hearing a lot recently about this webserver, including the usual foaming at the mouth advocacy from the Ruby-On-Rails crowd - but more interestingly and certainly more pertinant to my circumstances, were the glowing reports of people running PHP through it. So, over the last few days I've been experimenting with it, culminating with moving all sites on this webserver (including my blog) over to Lighttpd. To put it mildly, I've been blown away. It's been a long time since I was this impressed with a piece of software, let alone something as apparently mundane as a webserver.
Continue reading "Migrating from Apache to Lighttpd"
















