Xen-clustering/live-migration

shared storage

In order to be able to do live migration of Xen guests from one cluster member to another, some sort of shared storage is required. As the Xen guest won't run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file. Furthermore, if you want to share the Xen guest configuration files, then a cluster filesystem is required as soon as you want to write data to that share.

See: iSCSI-SAN

Xen clustering, clvm and filesystems

As soon as two or more Xen hosts (dom0 nodes) access the same (shared) storage you'll have to be very careful with writing data. Although a cluster filesystem is not necessary for Xen live migration to work, you would indeed need a cluster filesystem if you want to write any data to the share directly from a Xen host! Furthermore you probably want to use a volume manager to manage the available storage. As LVM2 itself is not cluster-aware you would need another solution.

Cluster LVM (CLVM)

We use CLVM, the Cluster Logical Volume Manager that forms part of Redhat Cluster Suite (TM). This software suite, generously made available by Redhat (TM) is also available on Debian Lenny.

Install

Installation of the suite requires the userland tools and kernel modules.

apt-get install redhat-cluster-suite redhat-cluster-modules-2.6.26-1-xen-amd64

Config

First instruct LVM to make use of the cluster locking functions, therefore change the locking type in /etc/lvm/lvm.conf to '3'.

...
locking_type = 3
...

Then set the default properties for the cluster manager (cman) in /etc/default/cman. The nodename is FQDN, make sure that your DNS configuration and /etc/hosts are uptodate.

CLUSTERNAME="domnull"
NODENAME="node1.example.com"
USE_CCS="yes"
CLUSTER_JOIN_TIMEOUT=300
CLUSTER_JOIN_OPTIONS=""
CLUSTER_SHUTDOWN_TIMEOUT=60

Create the cluster configuration file /etc/cluster/cluster.conf and copy it to the other cluster node.

mkdir /etc/cluster

/etc/cluster/cluster.conf

<?xml version="1.0"?>
<cluster name="domnull" config_version="1">

<cman two_node="1" expected_votes="1">
</cman>

<clusternodes>
<clusternode name="node1.example.com" nodeid="1">
        <fence>
                <method name="single">
                        <device name="manual" ipaddr="node1.example.com"/>
                </method>
        </fence>
</clusternode>

<clusternode name="node2.example.com" nodeid="2">
        <fence>
                <method name="single">
                        <device name="manual" ipaddr="node2.example.com"/>
                </method>
        </fence>
</clusternode>
</clusternodes>

<fencedevices>
        <fencedevice name="manual" agent="fence_manual"/>
</fencedevices>

</cluster>

You might want to change the default order of the init-scripts stopping a cluster node. In general you want the gfs(2)-tools to stop first. Then the clvmd service can stop and afterwards the cman-service. Make sure that the multipath and iscsi initiator-services are running at least till the cluster-services are properly shutdown.

Restart the cluster nodes in order to get going. You can check the status of the cluster:

cman_tool status

The output will be something like:

Version: 6.1.0
Config Version: 1
Cluster Name: domnull
Cluster Id: 13368
Cluster Member: Yes
Cluster Generation: 12
Membership state: Cluster-Member
Nodes: 2
Expected votes: 1
Total votes: 2
Node votes: 1
Quorum: 1  
Active subsystems: 8
Flags: 2node Dirty 
Ports Bound: 0 11  
Node name: node2.example.com
Node ID: 2
Multicast addresses: 239.192.52.108 
Node addresses: 192.168.1.242

Simulate a cluster with two domU's within a dom0

When two Xen guests are running on the same dom0, which provides a shared disk to both of them (in order to simulate a shared-storage cluster environment), add a bang after the 'w' in the Xen guest's configuration files to force Xen to allow you to mount the share in read-write mode (as Xen locks multiple access to the same virtual block device by default). This is not necessary if the connection to the shared storage is accesseble over iSCSI. Then you can simply initiate the iSCSI-sessions from within the Xen guests.

The Xen guest configuration file should contain something like this:

...
disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w!'*]
instead of:
disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w'*]
...

In this example you would have to use a cluster-aware filesystem like GFS/GFS2 or OCFS2 in order to prevent data corruption by the Xen guests.

GFS/GFS2

GFS, the Global File System is a shared disk file system that was released under the GPL by Redhat (TM) in 2004 (just a few months after they'd bought Sistina). Currently GFS2 is available as a technology preview. For production use it's probably wiser to opt for GFS.

Install

If you installed and configured CLVM as mentioned just above, you won't need any extra software on Debian Lenny.

Config

You can just format the device (logical volume in our case) and mount it afterwards just as you would do with a more regular filesystem. You only have to add some options.

-p (the protocol which is probably 'lock_dlm')
-t clustername:filesystem_name (yes, you have to come up with a name for your fs)
-j n ('n' is the number of journals to create. One journal is required for each node that mounts the file system.).

The commands for GFS and GFS2 do slightly differ:

GFS

gfs_mkfs -p lock_dlm -t domnull:xenvol2 /dev/xenvg/xenvol2 -j 2

Output

This will destroy any data on /dev/xenvg/xenvol2.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/xenvg/xenvol2
Blocksize:                 4096
Filesystem Size:           720804
Journals:                  2
Resource Groups:           12
Locking Protocol:          lock_dlm
Lock Table:                domnull:xenvol2

Syncing...
All Done

GFS2

 mkfs.gfs2 -p lock_dlm -t domnull:xenvol2 /dev/xenvg/xenvol2 -j 2

Output

This will destroy any data on /dev/xenvg/xenvol2.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/xenvg/xenvol2
Blocksize:                 4096
Device Size                3.00 GB (786432 blocks)
Filesystem Size:           3.00 GB (786431 blocks)
Journals:                  2
Resource Groups:           12
Locking Protocol:          "lock_dlm"
Lock Table:                "domnull:xenvol2"

OCFS2

Today's Linux kernel has support for the rewrited Oracle Cluster Filesystem (OCFS2). It allows several clients to read-write to the same (thus shared) storage. You could use this filesystem to share the Xen guest configuration files among several Xen hosts.

Install

Debian Lenny has kernel support for OCFS2 by default. You only need the ocfs2-tools (and optionally 'ocfs2console' if you want a graphical interface).

apt-get install ocfs2-tools

Config

dpkg can help with some basic configuration:

dpkg-reconfigure ocfs2-tools

Manually check/edit /etc/default/oc2b

# O2CB_ENABLED: 'true' means to load the driver on boot.
O2CB_ENABLED=true

# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=domnull

# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=31

# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead.
O2CB_IDLE_TIMEOUT_MS=30000

# O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet is sent.
O2CB_KEEPALIVE_DELAY_MS=2000

# O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection attempts.
O2CB_RECONNECT_DELAY_MS=2000

Now create /etc/ocfs2/cluster.conf for your cluster

node:
        ip_port = 7777
        ip_address = 192.168.1.248
        number = 0
        name = xenhost1
        cluster = domnull

node:
        ip_port = 7777
        ip_address = 192.168.1.249
        number = 1
        name = xenhost2
        cluster = domnull

cluster:
        node_count = 2
        name = domnull

Restart the cluster nodes in order to get started. You can check the status of your cluster like so:

/etc/init.d/o2cb status

The output should be like:

Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Stack glue driver: Loaded
Stack plugin "o2cb": Loaded
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster domnull: Online
Heartbeat dead threshold = 31
  Network idle timeout: 30000
  Network keepalive delay: 2000
  Network reconnect delay: 2000
Checking O2CB heartbeat: Active

At this moment you can simply format and use your shared storage:

mkfs.ocfs2 /dev/xenvg/xenvol1
mount -t ocfs2 /dev/xenvg/xenvol1 /mnt/

Config

Configure the xen relocation service in /etc/xen/xend-config.sxp

 ...
(xend-relocation-address '')
(xend-relocation-server yes)
(xend-relocation-port 8002)
(xend-relocation-address '')
(xend-relocation-hosts-allow '')
...

Restart xend on both nodes and make sure that port 8002 accepts connections from everywhere. Check for a LISTENER line with netstat.

Important: Time has to be synced between both nodes (See ntp).

Action

Start a domU on node1 and migrate it to node-2 like so:

xm migrate --live name_xen_guest node2

DomU in HA-cluster control

Put identical xen-domU config files in the /etc/xen/ directory.

Create a /etc/cluster/cluster.conf with:

<?xml version="1.0"?>
<cluster name="domup" config_version="1">
<cman two_node="1" expected_votes="1">
</cman>
<clusternodes>
<clusternode name="saito.example.com" nodeid="1">
        <fence>
                <method name="single">
                        <device name="manual" ipaddr="saito.example.com"/>
                </method>
        </fence>
</clusternode>
<clusternode name="obeliks.example.com" nodeid="2">
        <fence>
                <method name="single">
                        <device name="manual" ipaddr="obeliks.example.com"/>
                </method>
        </fence>
</clusternode>
</clusternodes>
<rm>
  <failoverdomains>
    <failoverdomain name="FD" ordered="1" restricted="1">
      <failoverdomainnode name="saito.example.com"/>
      <failoverdomainnode name="obeliks.example.com"/>
    </failoverdomain>
  </failoverdomains>

  <vm name="xendomU1" domain="FD" autostart="0"/>

</rm>
<fencedevices>
        <fencedevice name="manual" agent="fence_manual"/>
</fencedevices>
</cluster>

Literature

a proof of concept regarding Xen migration on Suse Linux by Novell Presales, available at http://forge.novell.com/.../XEN_migration_demo_1.1.pdf

http://www.linux1394.org

http://www.drbd.org

documentation by Jeffrey Hunter on Oracle Technology Network regarding Oracle RAC on Linux and Firewire, available at http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html

a thesis from Espen Braastad (University of Oslo) named 'Management of high availibility services using virtualization', May 22 2006, available at http://www.linpro.no/content/download/519/3617/file/espen_HAxen.pdf

HannibalWiki

Table of Contents

Xen-clustering/live-migration

shared storage

Xen clustering, clvm and filesystems

Cluster LVM (CLVM)

Install

Config

Simulate a cluster with two domU's within a dom0

GFS/GFS2

Install

Config

OCFS2

Install

Config

Config

Action

DomU in HA-cluster control

Literature

HannibalWiki

User Tools

Site Tools

Table of Contents

Xen-clustering/live-migration

shared storage

Xen clustering, clvm and filesystems

Cluster LVM (CLVM)

Install

Config

Simulate a cluster with two domU's within a dom0

GFS/GFS2

Install

Config

OCFS2

Install

Config

Config

Action

DomU in HA-cluster control

Literature

Page Tools