Table of Contents

iSCSI

shared storage

In order to be able to do live migration of Xen guests from one cluster member to another, some sort of shared storage is required. As the Xen guest won't run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file. Furthermore, if you want to share the Xen guest configuration files, then a cluster filesystem is required as soon as you want to write data to that share.

We wanted to look beyond high-end FCAL-based SAN's. Inspired by some Oracle RAC documentation (see: http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html) a shared firewire disk or -array appeared to be an option. A second possibility was mentioned in the Xen live migration paper from Novell/SuSe (see below). In this proof of concept iSCSI is being used as a shared storage solution.

Firewire

By default Linux logs on to a firewire device in exclusive mode. This prevents accidently accessing the same device by another node that would probably screw up your data. Fortunately you can bypass the exclusive login mechanism using a kernel module option to the serial bus protocol kernel module (sbp2 exclusive_login=0).

For this to work, the chipset of your firewire device(s) should support multiple logins. For example the Oxford-chipset is known to support multiple logins. Check the afore-mentioned Oracle RAC documentation for more information on shared firewire hardware.

Adjust /etc/modules for the necessary modules

sd_mod
ieee1394
ohci1394
sbp2 exclusive_login=0

Adjust /etc/modprobe.d/sbp2

options sbp2 exclusive_login=0 serialize_io=1

For immediate effect:

rmmod sbp2
modprobe sbp2

For permanent effect, rebuild your initial-ramdisk, to have these options also used in this (because the sbp2 module is loaded at boottime).

mkinitramfs -o foo.version.img <kernel-version>
or
mkinitrd -o /boot/initrd.img-2.6.12.6-xen-fw 2.6.12.6-xen   (example)

iSCSI

target

First we need an iSCSI target. That is a device/server that provides shared storage on your network. We use the iSCSI enterprise target software to build a Linux based iSCSI target server.

install

=from source=

You can download the software at http://iscsitarget.sourceforge.net/

After building and installing the software, you'll have a kernel module named iscsi_trgt, a daemon called 'ietd' and a tool called 'ietdadm'.

=package=

Debian Lenny

It's fairly easy to create an ISCSI-target using Debian Lenny as prebuild binary packages are available. You'll need userland tools and kernel-modules for your kernel:

apt-get install iscsitarget iscsitarget-modules-2.6.26-1-xen-amd64

Debian Etch

Although binary packages are not yet available for Debian Etch, Philipp Hug created unofficial packages for Debian Sid and Ubuntu Dapper. They are available at http://iscsitarget.sourceforge.net/wiki/index.php/Unoffical_DEBs

We installed the binary package named 'iscsitarget', which contains the userland binaries, on Debian Etch with no problems. The package named 'iscsitarget-source' contains the kernel module sources. This package allows you to build a binary kernel module for your kernel. The build on Debian Etch went flawlessly.

Add this line to /etc/apt/sources.list

deb http://debian.hug.cx/debian/ unstable/

Then procede to install the software and build the kernelmodule

apt-get install module-assistant debhelper linux-source-2.6.18 dpkg-dev \
                kernel-package libncurses-dev libssl-dev linux-headers-2.6.18-4-xen-amd64
cd /usr/src/
tar -jxvf linux-source-2.6.18.tar.bz2
ln -s linux-source-2.6.18 linux

apt-get install  iscsitarget iscsitarget-source
tar -zxvf iscsitarget.tar.gz  (this unpacks in sub-dir iscsitarget)

m-a a-i iscsitarget
config

After installing the Debian package enable the iSCSI-target in /etc/default/iscsitarget

ISCSITARGET_ENABLE=true

Let's configure the daemon. We have to tell it which device(s) to enable and which clients should be able to access them. In the next example we will enable the logical volumes named 'vault1' and 'vault2' to everybody. The configuration file is /etc/ietd.conf

Target iqn.2006-07.com.example.intra:storage.disk1.vault
       Lun 0 Path=/dev/mapper/vg00-vault1,Type=fileio
       Alias vault1
Target iqn.2006-07.com.example.intra:storage.disk2.vault
       Lun 1 Path=/dev/mapper/vg00-vault2,Type=fileio
       Alias vault2

Remember that every node on your network that uses iSCSI will need a unique 'iqn'. Check the iSCSI documentation on the web for the applicable syntax. You can add some lines to /etc/ietd.conf that require a username/password for iSCSI logons to succeed but this is ommitted by default.

Optionally you can provide an obligated SCSI-ID. This proves handy e.g. if you replicate your iSCSI SAN and you want to be able to bundle replicated shares on separate nodes using multipath. In this example snipplet a logical volume that is replicated by DRBD will serve as backend for our iSCSI-target.

Target iqn.2009-01.nl.pref:storage.xensan1
       Lun 0 Path=/dev/drbd0,Type=fileio,ScsiId=149455403f160c00
       Alias san1

Important note if you want to put iSCSI on top of a DRBD device: on Debian Lenny the package init-scripts start iSCSI before DRBD and vice versa stop DRBD before iSCSI. You're advised to change these values around.

Start the iSCSI target daemon to enable the shared storage provider. This will open TCP port 3260 by default.

/etc/init.d/iscsitarget start

Suggestion: you might want to use ethernet bonding in order to aggregate several physical NIC's.

initiator

On the clients that will have to access the iSCSI based shared storage we need to install and configure iSCSI initiator software. We'll use the Open iSCSI package.

install

=from source=

The source is available at http://www.open-iscsi.org/

=package=

Debian Lenny and Debian Etch provide a binary package that contains iSCSI initiator software.

apt-get install open-iscsi
config

After building and installing the software, you'll have two kernel modules named iscsi_tcp and scsi_transport_iscsi, a daemon called 'iscsid' and a tool called 'iscsiadm'.

The install procedure of Open iSCSI will create a configuration file /etc/iscsi/iscsid.conf that enables a default configuration. You might want to enable iSCSI on node startup.

...
node.startup = automatic
...

Now it's imperative to create a unique 'iqn' for our client and store it in /etc/iscsi/initiatorname.iscsi

InitiatorName=iqn.2006-07.com.example.intra:hannibal.clientnode1

Afterwards, start the Open iSCSI daemon on the client

/etc/init.d/open-iscsi start

Let's check at our iSCSI target server, for instance 192.168.1.16

iscsiadm -m discovery -t sendtargets -p 192.168.1.16:3260

After logging on to the iSCSI-target a new SCSI-device should have been added to the client

iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.1.16:3260 -l

In case you want to logout from your session make sure that you have unmounted the iSCSI-share and that you have deactivated the volume group in case you use LVM.

vgchange -a n
iscsiadm -m node -T iqn.2006-07.com.example.intra:storage.disk1.vault -p 192.168.1.16:3260 -u

Have fun!

multipath

iSCSI supports more than one connection to the same iSCSI LUN. This allows for high available setups.

On Linux the multipath-tools can map several iSCSI-devices into one multipath blockdevice that has loadbalancing and failover as features.

The multipath-tools are available as a binary package for Debian Lenny and Debian Etch.

apt-get install multipath-tools

Create the /etc/multipath.conf file (some examples are available in /usr/share/doc/multipath-tools/examples). The SCSI-ID that must be entered on the 'wwid' line can be obtained by the scsi_id tool. In our example we'll get the same ID for sda and sdb. Remember they're two paths to the same LUN (although on different SAN-servers). As the scsi_id tool is not in a regular path by default (at least on Debian Lenny), let's first make a symbolic link.

ln -s /lib/udev/scsi_id /sbin/scsi_id
/sbin/scsi_id -g -u -s /block/sda

/etc/multipath.conf

defaults {
   user_friendly_names yes
}
defaults {
       udev_dir        /dev
       polling_interval 5
       default_selector        "round-robin 0"
       default_getuid_callout  "/sbin/scsi_id -g -u -s /block/%n"
       failback        immediate
}
blacklist {
       wwid    200d04b651805e38e
       devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
       devnode "^hd[a-z][[0-9]*]"
       devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
}
multipaths {
       multipath {
               wwid                    1494554000000000031343934353534303366313630633030
               alias                   xensan
               path_grouping_policy    failover
               path_checker            readsector0
}

Now after a reload of the multipath-tools and logging on the the iSCSI-targets, your multipath blockdevice will be ready for usage.

/etc/init.d/multipath-tools reload
/usr/bin/iscsiadm -m node -T iqn.2009-01.nl.pref:storage.xensan -p 192.168.1.250:3260 -l
/usr/bin/iscsiadm -m node -T iqn.2009-01.nl.pref:storage.xensan2 -p 192.168.1.252:3260 -l

Lets check our new device:

multipath -ll

The output is something like

xensan (1494554000000000031343934353534303366313630633030) dm-0 IET     ,VIRTUAL-DISK  
[size=9.8G][features=0][hwhandler=0]
\_ round-robin 0 [prio=1][active]
 \_ 1:0:0:0 sda 8:0   [active][ready]
\_ round-robin 0 [prio=1][enabled]
 \_ 0:0:0:0 sdb 8:16  [active][ready]

In addition to using multipath one could also consider to setup a host-based mirror for the shared storage. This could be accomplished by setting up two or more iSCSI-servers (targets) and join them in a software mirror (RAID-1) MD-device. This is left as an exercise for the reader as we choose to replicate our data using DRBD instead (SAN based mirroring).

Distributed replicated block device (drbd8)

Install

Binary package are available for Debian Lenny.

apt-get install  drbd8-modules-2.6-amd64 drbd8-utils

Config

We want DRBD active-active replication (both our SAN nodes must be in primary mode in order to allow HA and Xen live migration). Edit /etc/drbd.conf

global {
  usage-count yes;
}
common {
  protocol C;
}
resource xensan {
  net { 
    allow-two-primaries; 
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }
  startup {
    become-primary-on both;
    wfc-timeout 0;
    degr-wfc-timeout 120;
  }
  disk {
    on-io-error detach;
# You should only disable device flushes when running DRBD on devices with a battery-backed
# write cache (BBWC). Most storage controllers allow to automatically disable the write cache 
# when the battery is depleted, switching to write-through mode when the battery dies. It is 
# strongly recommended to enable such a feature.
    no-disk-flushes;
    no-md-flushes;
  }
  syncer {
    rate 300M;
  }
  on xensan1 {
    device    /dev/drbd0;
    disk      /dev/vg00/san1;
    address   192.168.1.250:7789;
    meta-disk internal;
  }
  on xensan2 {
    device    /dev/drbd0;
    disk      /dev/vg00/san2;
    address   192.168.1.252:7789;
    meta-disk internal;
  }
}

As you can see in drbd.conf we use internal metadata on our DRBD-device. The DRBD-website provides thorough documentation on the internals of DRBD. For metadata on an apart device change the appropriate lines to:

...
meta-disk /dev/vg00/san1meta [0];
...
meta-disk /dev/vg00/san2meta [0];
...

Anyway the metadata needs to be written to disk first (on both nodes).

drbdadm create-md xensan

Start the daemons on both nodes:

/etc/init.d/drbd start

Choose one node as the initial master. This one will replicate its data to the other node. Data on the other node will be LOST!!! (in other words; issue the command on the host which has the data wich has to be preserved)

drbdadm -- --overwrite-data-of-peer primary xensan

You can monitor the progress of the replication in /proc/drbd As soon as this process has finished, repeat this last step on the other node. Afterwards /proc/drbd should look like this:

version: 8.0.14 (api:86/proto:86)
GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre, 2008-11-12 16:40:33
 0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
    ns:435488 nr:522292 dw:1229708 dr:165620 al:14 bm:85 lo:0 pe:0 ua:0 ap:0
	resync: used:0/61 hits:8209 misses:7 starving:0 dirty:0 changed:7
	act_log: used:0/127 hits:176840 misses:27 starving:0 dirty:13 changed:14

This alternative provides less detailed output:

/etc/init.d/drbd status

Important note if you want to put iSCSI on top of a DRBD device: on Debian Lenny the package init-scripts start iSCSI before DRBD and vice versa stop DRBD before iSCSI. You're advised to change these values around.

Literature