=======Xen-clustering/live-migration======= =====shared storage===== In order to be able to do live migration of Xen guests from one cluster member to another, some sort of shared storage is required. As the Xen guest won't run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file. Furthermore, if you want to share the Xen guest configuration files, then a cluster filesystem is required as soon as you want to write data to that share. See: [[storage:iSCSI-SAN]] ====Xen clustering, clvm and filesystems==== As soon as two or more Xen hosts (dom0 nodes) access the same (shared) storage you'll have to be very careful with writing data. Although a cluster filesystem is not necessary for Xen live migration to work, you would indeed need a cluster filesystem if you want to write any data to the share directly from a Xen host! Furthermore you probably want to use a volume manager to manage the available storage. As LVM2 itself is not cluster-aware you would need another solution. ===Cluster LVM (CLVM)=== We use CLVM, the Cluster Logical Volume Manager that forms part of Redhat Cluster Suite (TM). This software suite, generously made available by Redhat (TM) is also available on Debian Lenny. ==Install== Installation of the suite requires the userland tools and kernel modules. apt-get install redhat-cluster-suite redhat-cluster-modules-2.6.26-1-xen-amd64 ==Config== First instruct LVM to make use of the cluster locking functions, therefore change the locking type in /etc/lvm/lvm.conf to '3'. ... locking_type = 3 ... Then set the default properties for the cluster manager (cman) in /etc/default/cman. The nodename is FQDN, make sure that your DNS configuration and /etc/hosts are uptodate. CLUSTERNAME="domnull" NODENAME="node1.example.com" USE_CCS="yes" CLUSTER_JOIN_TIMEOUT=300 CLUSTER_JOIN_OPTIONS="" CLUSTER_SHUTDOWN_TIMEOUT=60 Create the cluster configuration file /etc/cluster/cluster.conf and copy it to the other cluster node. mkdir /etc/cluster /etc/cluster/cluster.conf You might want to change the default order of the init-scripts stopping a cluster node. In general you want the gfs(2)-tools to stop first. Then the clvmd service can stop and afterwards the cman-service. Make sure that the multipath and iscsi initiator-services are running at least till the cluster-services are properly shutdown. Restart the cluster nodes in order to get going. You can check the status of the cluster: cman_tool status The output will be something like: Version: 6.1.0 Config Version: 1 Cluster Name: domnull Cluster Id: 13368 Cluster Member: Yes Cluster Generation: 12 Membership state: Cluster-Member Nodes: 2 Expected votes: 1 Total votes: 2 Node votes: 1 Quorum: 1 Active subsystems: 8 Flags: 2node Dirty Ports Bound: 0 11 Node name: node2.example.com Node ID: 2 Multicast addresses: 239.192.52.108 Node addresses: 192.168.1.242 ===Simulate a cluster with two domU's within a dom0=== When two Xen guests are running on the same dom0, which provides a shared disk to both of them (in order to simulate a shared-storage cluster environment), add a bang after the 'w' in the Xen guest's configuration files to force Xen to allow you to mount the share in read-write mode (as Xen locks multiple access to the same virtual block device by default). This is not necessary if the connection to the shared storage is accesseble over iSCSI. Then you can simply initiate the iSCSI-sessions from within the Xen guests. The Xen guest configuration file should contain something like this: ... disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w!'*] instead of: disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w'*] ... In this example you would have to use a cluster-aware filesystem like GFS/GFS2 or OCFS2 in order to prevent data corruption by the Xen guests. ===GFS/GFS2=== GFS, the Global File System is a shared disk file system that was released under the GPL by Redhat (TM) in 2004 (just a few months after they'd bought Sistina). Currently GFS2 is available as a technology preview. For production use it's probably wiser to opt for GFS. ==Install== If you installed and configured CLVM as mentioned just above, you won't need any extra software on Debian Lenny. ==Config== You can just format the device (logical volume in our case) and mount it afterwards just as you would do with a more regular filesystem. You only have to add some options. * -p (the protocol which is probably 'lock_dlm') * -t clustername:filesystem_name (yes, you have to come up with a name for your fs) * -j n ('n' is the number of journals to create. One journal is required for each node that mounts the file system.). The commands for GFS and GFS2 do slightly differ: GFS gfs_mkfs -p lock_dlm -t domnull:xenvol2 /dev/xenvg/xenvol2 -j 2 Output This will destroy any data on /dev/xenvg/xenvol2. Are you sure you want to proceed? [y/n] y Device: /dev/xenvg/xenvol2 Blocksize: 4096 Filesystem Size: 720804 Journals: 2 Resource Groups: 12 Locking Protocol: lock_dlm Lock Table: domnull:xenvol2 Syncing... All Done GFS2 mkfs.gfs2 -p lock_dlm -t domnull:xenvol2 /dev/xenvg/xenvol2 -j 2 Output This will destroy any data on /dev/xenvg/xenvol2. Are you sure you want to proceed? [y/n] y Device: /dev/xenvg/xenvol2 Blocksize: 4096 Device Size 3.00 GB (786432 blocks) Filesystem Size: 3.00 GB (786431 blocks) Journals: 2 Resource Groups: 12 Locking Protocol: "lock_dlm" Lock Table: "domnull:xenvol2" ===OCFS2=== Today's Linux kernel has support for the rewrited Oracle Cluster Filesystem (OCFS2). It allows several clients to read-write to the same (thus shared) storage. You could use this filesystem to share the Xen guest configuration files among several Xen hosts. ==Install== Debian Lenny has kernel support for OCFS2 by default. You only need the ocfs2-tools (and optionally 'ocfs2console' if you want a graphical interface). apt-get install ocfs2-tools ==Config== dpkg can help with some basic configuration: dpkg-reconfigure ocfs2-tools Manually check/edit /etc/default/oc2b # O2CB_ENABLED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=domnull # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. O2CB_HEARTBEAT_THRESHOLD=31 # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead. O2CB_IDLE_TIMEOUT_MS=30000 # O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet is sent. O2CB_KEEPALIVE_DELAY_MS=2000 # O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection attempts. O2CB_RECONNECT_DELAY_MS=2000 Now create /etc/ocfs2/cluster.conf for your cluster node: ip_port = 7777 ip_address = 192.168.1.248 number = 0 name = xenhost1 cluster = domnull node: ip_port = 7777 ip_address = 192.168.1.249 number = 1 name = xenhost2 cluster = domnull cluster: node_count = 2 name = domnull Restart the cluster nodes in order to get started. You can check the status of your cluster like so: /etc/init.d/o2cb status The output should be like: Driver for "configfs": Loaded Filesystem "configfs": Mounted Stack glue driver: Loaded Stack plugin "o2cb": Loaded Driver for "ocfs2_dlmfs": Loaded Filesystem "ocfs2_dlmfs": Mounted Checking O2CB cluster domnull: Online Heartbeat dead threshold = 31 Network idle timeout: 30000 Network keepalive delay: 2000 Network reconnect delay: 2000 Checking O2CB heartbeat: Active At this moment you can simply format and use your shared storage: mkfs.ocfs2 /dev/xenvg/xenvol1 mount -t ocfs2 /dev/xenvg/xenvol1 /mnt/ =====Config===== Configure the xen relocation service in /etc/xen/xend-config.sxp ... (xend-relocation-address '') (xend-relocation-server yes) (xend-relocation-port 8002) (xend-relocation-address '') (xend-relocation-hosts-allow '') ... Restart xend on both nodes and make sure that port 8002 accepts connections from everywhere. Check for a LISTENER line with netstat. Important: Time has to be synced between both nodes (See [[hannibal:ntp]]). =====Action===== Start a domU on node1 and migrate it to node-2 like so: xm migrate --live name_xen_guest node2 ====DomU in HA-cluster control==== Put identical xen-domU config files in the /etc/xen/ directory. Create a /etc/cluster/cluster.conf with: =====Literature===== * a proof of concept regarding Xen migration on Suse Linux by Novell Presales, available at [[http://forge.novell.com/modules/xfcontent/private.php?reference_id=2736&content=/library/Xen%20live%20migration%20demo/XEN_migration_demo_1.1.pdf|http://forge.novell.com/.../XEN_migration_demo_1.1.pdf]] * http://www.linux1394.org * http://www.drbd.org * documentation by Jeffrey Hunter on Oracle Technology Network regarding Oracle RAC on Linux and Firewire, available at http://www.oracle.com/technology/pub/articles/hunter_rac10gr2.html * a thesis from Espen Braastad (University of Oslo) named 'Management of high availibility services using virtualization', May 22 2006, available at http://www.linpro.no/content/download/519/3617/file/espen_HAxen.pdf