In order to be able to do live migration of Xen guests from one cluster member to another, some sort of shared storage is required. As the Xen guest won't run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file. Furthermore, if you want to share the Xen guest configuration files, then a cluster filesystem is required as soon as you want to write data to that share.
As soon as two or more Xen hosts (dom0 nodes) access the same (shared) storage you'll have to be very careful with writing data. Although a cluster filesystem is not necessary for Xen live migration to work, you would indeed need a cluster filesystem if you want to write any data to the share directly from a Xen host! Furthermore you probably want to use a volume manager to manage the available storage. As LVM2 itself is not cluster-aware you would need another solution.
We use CLVM, the Cluster Logical Volume Manager that forms part of Redhat Cluster Suite (TM). This software suite, generously made available by Redhat (TM) is also available on Debian Lenny.
Installation of the suite requires the userland tools and kernel modules.
apt-get install redhat-cluster-suite redhat-cluster-modules-2.6.26-1-xen-amd64
First instruct LVM to make use of the cluster locking functions, therefore change the locking type in /etc/lvm/lvm.conf to '3'.
... locking_type = 3 ...
Then set the default properties for the cluster manager (cman) in /etc/default/cman. The nodename is FQDN, make sure that your DNS configuration and /etc/hosts are uptodate.
CLUSTERNAME="domnull" NODENAME="node1.example.com" USE_CCS="yes" CLUSTER_JOIN_TIMEOUT=300 CLUSTER_JOIN_OPTIONS="" CLUSTER_SHUTDOWN_TIMEOUT=60
Create the cluster configuration file /etc/cluster/cluster.conf and copy it to the other cluster node.
<?xml version="1.0"?> <cluster name="domnull" config_version="1"> <cman two_node="1" expected_votes="1"> </cman> <clusternodes> <clusternode name="node1.example.com" nodeid="1"> <fence> <method name="single"> <device name="manual" ipaddr="node1.example.com"/> </method> </fence> </clusternode> <clusternode name="node2.example.com" nodeid="2"> <fence> <method name="single"> <device name="manual" ipaddr="node2.example.com"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="manual" agent="fence_manual"/> </fencedevices> </cluster>
You might want to change the default order of the init-scripts stopping a cluster node. In general you want the gfs(2)-tools to stop first. Then the clvmd service can stop and afterwards the cman-service. Make sure that the multipath and iscsi initiator-services are running at least till the cluster-services are properly shutdown.
Restart the cluster nodes in order to get going. You can check the status of the cluster:
The output will be something like:
Version: 6.1.0 Config Version: 1 Cluster Name: domnull Cluster Id: 13368 Cluster Member: Yes Cluster Generation: 12 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Node votes: 1 Quorum: 1 Active subsystems: 8 Flags: 2node Dirty Ports Bound: 0 11 Node name: node2.example.com Node ID: 2 Multicast addresses: 18.104.22.168 Node addresses: 192.168.1.242
When two Xen guests are running on the same dom0, which provides a shared disk to both of them (in order to simulate a shared-storage cluster environment), add a bang after the 'w' in the Xen guest's configuration files to force Xen to allow you to mount the share in read-write mode (as Xen locks multiple access to the same virtual block device by default). This is not necessary if the connection to the shared storage is accesseble over iSCSI. Then you can simply initiate the iSCSI-sessions from within the Xen guests.
The Xen guest configuration file should contain something like this:
... disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w!'*] instead of: disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w'*] ...
In this example you would have to use a cluster-aware filesystem like GFS/GFS2 or OCFS2 in order to prevent data corruption by the Xen guests.
GFS, the Global File System is a shared disk file system that was released under the GPL by Redhat (TM) in 2004 (just a few months after they'd bought Sistina). Currently GFS2 is available as a technology preview. For production use it's probably wiser to opt for GFS.
If you installed and configured CLVM as mentioned just above, you won't need any extra software on Debian Lenny.
You can just format the device (logical volume in our case) and mount it afterwards just as you would do with a more regular filesystem. You only have to add some options.
The commands for GFS and GFS2 do slightly differ:
gfs_mkfs -p lock_dlm -t domnull:xenvol2 /dev/xenvg/xenvol2 -j 2
This will destroy any data on /dev/xenvg/xenvol2. Are you sure you want to proceed? [y/n] y Device: /dev/xenvg/xenvol2 Blocksize: 4096 Filesystem Size: 720804 Journals: 2 Resource Groups: 12 Locking Protocol: lock_dlm Lock Table: domnull:xenvol2 Syncing... All Done
mkfs.gfs2 -p lock_dlm -t domnull:xenvol2 /dev/xenvg/xenvol2 -j 2
This will destroy any data on /dev/xenvg/xenvol2. Are you sure you want to proceed? [y/n] y Device: /dev/xenvg/xenvol2 Blocksize: 4096 Device Size 3.00 GB (786432 blocks) Filesystem Size: 3.00 GB (786431 blocks) Journals: 2 Resource Groups: 12 Locking Protocol: "lock_dlm" Lock Table: "domnull:xenvol2"
Today's Linux kernel has support for the rewrited Oracle Cluster Filesystem (OCFS2). It allows several clients to read-write to the same (thus shared) storage. You could use this filesystem to share the Xen guest configuration files among several Xen hosts.
Debian Lenny has kernel support for OCFS2 by default. You only need the ocfs2-tools (and optionally 'ocfs2console' if you want a graphical interface).
apt-get install ocfs2-tools
dpkg can help with some basic configuration:
Manually check/edit /etc/default/oc2b
# O2CB_ENABLED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=domnull # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. O2CB_HEARTBEAT_THRESHOLD=31 # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead. O2CB_IDLE_TIMEOUT_MS=30000 # O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet is sent. O2CB_KEEPALIVE_DELAY_MS=2000 # O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection attempts. O2CB_RECONNECT_DELAY_MS=2000
Now create /etc/ocfs2/cluster.conf for your cluster
node: ip_port = 7777 ip_address = 192.168.1.248 number = 0 name = xenhost1 cluster = domnull node: ip_port = 7777 ip_address = 192.168.1.249 number = 1 name = xenhost2 cluster = domnull cluster: node_count = 2 name = domnull
Restart the cluster nodes in order to get started. You can check the status of your cluster like so:
The output should be like:
Driver for "configfs": Loaded Filesystem "configfs": Mounted Stack glue driver: Loaded Stack plugin "o2cb": Loaded Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster domnull: Online Heartbeat dead threshold = 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active
At this moment you can simply format and use your shared storage:
mkfs.ocfs2 /dev/xenvg/xenvol1 mount -t ocfs2 /dev/xenvg/xenvol1 /mnt/
Configure the xen relocation service in /etc/xen/xend-config.sxp
... (xend-relocation-address '') (xend-relocation-server yes) (xend-relocation-port 8002) (xend-relocation-address '') (xend-relocation-hosts-allow '') ...
Restart xend on both nodes and make sure that port 8002 accepts connections from everywhere. Check for a LISTENER line with netstat.
Important: Time has to be synced between both nodes (See ntp).
Start a domU on node1 and migrate it to node-2 like so:
xm migrate --live name_xen_guest node2
Put identical xen-domU config files in the /etc/xen/ directory.
Create a /etc/cluster/cluster.conf with:
<?xml version="1.0"?> <cluster name="domup" config_version="1"> <cman two_node="1" expected_votes="1"> </cman> <clusternodes> <clusternode name="saito.example.com" nodeid="1"> <fence> <method name="single"> <device name="manual" ipaddr="saito.example.com"/> </method> </fence> </clusternode> <clusternode name="obeliks.example.com" nodeid="2"> <fence> <method name="single"> <device name="manual" ipaddr="obeliks.example.com"/> </method> </fence> </clusternode> </clusternodes> <rm> <failoverdomains> <failoverdomain name="FD" ordered="1" restricted="1"> <failoverdomainnode name="saito.example.com"/> <failoverdomainnode name="obeliks.example.com"/> </failoverdomain> </failoverdomains> <vm name="xendomU1" domain="FD" autostart="0"/> </rm> <fencedevices> <fencedevice name="manual" agent="fence_manual"/> </fencedevices> </cluster>