User Tools

Site Tools


xen:live-migration_infrastructure

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
xen:live-migration_infrastructure [2009/01/07 20:36]
Olivier Brugman
xen:live-migration_infrastructure [2011/02/07 17:27] (current)
Luc Nieland
Line 6: Line 6:
 In order to be able to do live migration of Xen guests from one cluster member to another, some sort of shared storage is required. As the Xen guest won't run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file. Furthermore,​ if you want to share the Xen guest configuration files, then a cluster filesystem is required as soon as you want to write data to that share. In order to be able to do live migration of Xen guests from one cluster member to another, some sort of shared storage is required. As the Xen guest won't run on more than one cluster member at a time, a cluster filesystem is not required. That is, as long as you configure Xen to access the Xen guest by a physical device, not a file. Furthermore,​ if you want to share the Xen guest configuration files, then a cluster filesystem is required as soon as you want to write data to that share.
  
-We wanted to look beyond high-end FCAL-based SAN's. Inspired by some Oracle RAC documentation (see: http://​www.oracle.com/​technology/​pub/​articles/​hunter_rac10gr2.html) a shared firewire disk or -array appeared to be an option. A second possibility was mentioned in the Xen live migration paper from Novell/SuSe (see below). In this proof of concept iSCSI is being used as a shared storage solution. ​ 
  
-====Firewire====+See: [[storage:​iSCSI-SAN]]
  
-By default Linux logs on to a firewire device in exclusive mode. This prevents accidently accessing the same device by another node that would probably screw up your data. Fortunately you can bypass the exclusive login mechanism using a kernel module option to the serial bus protocol kernel module (sbp2 exclusive_login=0). 
  
-For this to work, the chipset of your firewire device(s) should support multiple logins. For example the Oxford-chipset is known to support multiple logins. Check the afore-mentioned Oracle RAC documentation for more information on shared firewire hardware. 
  
-Adjust /​etc/​modules for the necessary modules 
-  sd_mod 
-  ieee1394 
-  ohci1394 
-  sbp2 exclusive_login=0 
  
-Adjust /​etc/​modprobe.d/​sbp2 +====Xen clustering, clvm and filesystems==== 
-  options sbp2 exclusive_login=0 serialize_io=1+As soon as two or more Xen hosts (dom0 nodes) access the same (shared) storage you'll have to be very careful with writing data. Although a cluster filesystem is not necessary for Xen live migration to work, you would indeed need a cluster filesystem if you want to write any data to the share directly from a Xen host! Furthermore you probably want to use a volume manager to manage the available storage. As LVM2 itself is not cluster-aware you would need another solution.
  
-For immediate effect: +===Cluster LVM (CLVM)=== 
-  rmmod sbp2 +We use CLVM, the Cluster Logical Volume Manager that forms part of Redhat Cluster Suite (TM). This software suite, generously made available by Redhat (TM) is also available on Debian Lenny.
-  modprobe sbp2+
  
-For permanent effect, rebuild your initial-ramdisk,​ to have these options also used in this (because ​the sbp2 module is loaded at boottime)+==Install== 
-  ​mkinitramfs ​-o foo.version.img <kernel-version>​ +Installation of the suite requires ​the userland tools and kernel modules
-  or +  ​apt-get install redhat-cluster-suite redhat-cluster-modules-2.6.26-1-xen-amd64
-  mkinitrd ​-o /​boot/​initrd.img-2.6.12.6-xen-fw 2.6.12.6-xen   (example)+
  
-====iSCSI==== +==Config== 
-===target=== +First instruct LVM to make use of the cluster locking functionstherefore change ​the locking type in /etc/lvm/lvm.conf to '3'.
-First we need an iSCSI target. That is a device/​server that provides shared storage on your network. We use the iSCSI enterprise target software to build a Linux based iSCSI target server. +
- +
-==install== +
-=from source= +
- +
-You can download the software at http://​iscsitarget.sourceforge.net/​ +
- +
-After building and installing the softwareyou'll have a kernel module named iscsi_trgt, a daemon called '​ietd'​ and a tool called '​ietdadm'​. +
- +
-=package= +
- +
-Debian Lenny +
- +
-It's fairly easy to create an ISCSI-target using Debian Lenny as prebuild binary packages are available. You'll need userland tools and kernel-modules for your kernel: +
- +
-  apt-get install iscsitarget iscsitarget-modules-2.6.26-1-xen-amd64 +
- +
-Debian Etch +
- +
-Although binary packages are not yet available for Debian Etch, Philipp Hug created unofficial packages for Debian Sid and Ubuntu Dapper. They are available at http://​iscsitarget.sourceforge.net/​wiki/​index.php/​Unoffical_DEBs +
- +
-We installed ​the binary package named '​iscsitarget',​ which contains the userland binaries, on Debian Etch with no problems. The package named '​iscsitarget-source'​ contains the kernel module sources. This package allows you to build a binary kernel module for your kernel. The build on Debian Etch went flawlessly. +
- +
-Add this line to /​etc/​apt/​sources.list +
-  deb http://​debian.hug.cx/​debian/​ unstable/ +
- +
-Then procede to install the software and build the kernelmodule +
-  apt-get install module-assistant debhelper linux-source-2.6.18 dpkg-dev \ +
-                  kernel-package libncurses-dev libssl-dev linux-headers-2.6.18-4-xen-amd64 +
-  cd /usr/src/ +
-  tar -jxvf linux-source-2.6.18.tar.bz2 +
-  ln -s linux-source-2.6.18 linux +
-   +
-  apt-get install ​ iscsitarget iscsitarget-source +
-  tar -zxvf iscsitarget.tar.gz ​ (this unpacks in sub-dir iscsitarget) +
-   +
-  m-a a-i iscsitarget +
- +
-==config== +
-After installing the Debian package enable the iSCSI-target ​in /etc/default/iscsitarget +
- +
-  ISCSITARGET_ENABLE=true +
- +
-Let's configure the daemon. We have to tell it which device(s) to enable and which clients should be able to access them. In the next example we will enable the logical volumes named '​vault1'​ and '​vault2'​ to everybody. The configuration file is /etc/ietd.conf +
- +
-  Target iqn.2006-07.com.example.intra:​storage.disk1.vault +
-         Lun 0 Path=/​dev/​mapper/​vg00-vault1,​Type=fileio +
-         Alias vault1 +
-  Target iqn.2006-07.com.example.intra:​storage.disk2.vault +
-         Lun 1 Path=/​dev/​mapper/​vg00-vault2,​Type=fileio +
-         Alias vault2 +
- +
-Remember that every node on your network that uses iSCSI will need a unique '​iqn'​. Check the iSCSI documentation on the web for the applicable syntax. You can add some lines to /​etc/​ietd.conf that require a username/​password for iSCSI logons to succeed but this is ommitted by default. +
- +
-Optionally you can provide an obligated SCSI-ID. This proves handy e.g. if you replicate your iSCSI SAN and you want to be able to bundle replicated shares on separate nodes using multipath. In this example snipplet a logical volume that is replicated by DRBD will serve as backend for our iSCSI-target. +
- +
-  Target iqn.2009-01.nl.pref:​storage.xensan1 +
-         Lun 0 Path=/​dev/​drbd0,​Type=fileio,​ScsiId=149455403f160c00 +
-         Alias san1 +
- +
-Start the iSCSI target daemon to enable the shared storage provider. This will open TCP port 3260 by default. +
-  /​etc/​init.d/​iscsi-target start +
- +
-Suggestion: you might want to use [[hannibal:​bonding|ethernet bonding]] in order to aggregate several physical NIC's. +
- +
-===initiator=== +
-On the clients that will have to access the iSCSI based shared storage we need to install and configure iSCSI initiator software. We'll use the Open iSCSI package. +
- +
-==install== +
-=from source= +
- +
-The source is available at http://​www.open-iscsi.org/​ +
- +
-=package= +
- +
-Debian Lenny and Debian Etch provide a binary package that contains iSCSI initiator software. +
-  apt-get install open-iscsi +
- +
-==config== +
-After building and installing the software, you'll have two kernel modules named iscsi_tcp and scsi_transport_iscsi,​ a daemon called '​iscsid'​ and a tool called '​iscsiadm'​. +
- +
-The install procedure of Open iSCSI will create a configuration file /​etc/​iscsi/​iscsid.conf that enables a default configuration. +
-You might want to enable iSCSI on node startup.+
  
   ...   ...
-  ​node.startup ​automatic+  ​locking_type ​3
   ...   ...
  
-Now it's imperative to create a unique '​iqn' ​for our client and store it in /etc/iscsi/initiatorname.iscsi +Then set the default properties ​for the cluster manager (cman) ​in /etc/default/cman. The nodename is FQDN, make sure that your DNS configuration and /etc/hosts are uptodate.  
-  ​InitiatorName=iqn.2006-07.com.example.intra:​hannibal.clientnode1+<​code>​ 
 +CLUSTERNAME="​domnull"​ 
 +NODENAME="node1.example.com" 
 +USE_CCS="​yes"​ 
 +CLUSTER_JOIN_TIMEOUT=300 
 +CLUSTER_JOIN_OPTIONS=""​ 
 +CLUSTER_SHUTDOWN_TIMEOUT=60 
 +</​code>​
  
-Afterwards, start the Open iSCSI daemon on the client +Create ​the cluster configuration file /etc/cluster/cluster.conf and copy it to the other cluster node.
-  ​/etc/init.d/open-iscsi start+
  
-Let's check at our iSCSI target server, for instance 192.168.1.16 +  mkdir /​etc/​cluster
-  iscsiadm -m discovery -t sendtargets -p 192.168.1.16:​3260+
  
-After logging on to the iSCSI-target a new SCSI-device should have been added to the client +/​etc/​cluster/​cluster.conf 
-  ​iscsiadm -m node -T iqn.2006-07.com.example.intra:​storage.disk1.vault -p 192.168.1.16:3260 -l+<​code>​ 
 +<?xml version="​1.0"?>​ 
 +<cluster name="​domnull"​ config_version="​1">
  
-In case you want to logout from your session make sure that you have unmounted the iSCSI-share and that you have deactivated the volume group in case you use LVM. +<cman two_node="​1" expected_votes="​1">​ 
-  vgchange -a n +</​cman>​
-  iscsiadm -m node -T iqn.2006-07.com.example.intra:​storage.disk1.vault -p 192.168.1.16:3260 -u+
  
-Have fun!+<​clusternodes>​ 
 +<​clusternode name="​node1.example.com"​ nodeid="​1">​ 
 +        <​fence>​ 
 +                <method name="​single">​ 
 +                        <device name="​manual"​ ipaddr="​node1.example.com"/>​ 
 +                </​method>​ 
 +        </​fence>​ 
 +</​clusternode>​
  
-====multipath==== +<​clusternode name="​node2.example.com"​ nodeid="​2">​ 
-iSCSI supports more than one connection to the same iSCSI LUNThis allows for high available setups.+        <​fence>​ 
 +                <method name="​single">​ 
 +                        <device name="​manual"​ ipaddr="node2.example.com"/>​ 
 +                </​method>​ 
 +        </​fence>​ 
 +</​clusternode>​ 
 +</​clusternodes>​
  
-On Linux the multipath-tools can map several iSCSI-devices into one multipath blockdevice that has loadbalancing and failover as features.+<​fencedevices>​ 
 +        <​fencedevice name="​manual"​ agent="​fence_manual"/>​ 
 +</​fencedevices>​
  
-The multipath-tools are available as a binary package for Debian Lenny and Debian Etch. +</​cluster>​ 
-  ​apt-get install multipath-tools+</​code>​
  
-Create ​the /​etc/​multipath.conf file (some examples are available in /​usr/​share/​doc/​multipath-tools/examples)The SCSI-ID that must be entered on the '​wwid'​ line can be obtained by the scsi_id toolIn our example we'll get the same ID for sda and sdb. Remember they'​re two paths to the same LUN (although on different SAN-servers). +You might want to change ​the default order of the init-scripts stopping a cluster nodeIn general you want the gfs(2)-tools ​to stop firstThen the clvmd service ​can stop and afterwards ​the cman-serviceMake sure that the multipath ​and iscsi initiator-services are running ​at least till the cluster-services are properly shutdown 
-As the scsi_id tool is not in a regular path by default (at least on Debian Lenny), let's first make a symbolic link.+
  
-  ln -s /​lib/​udev/​scsi_id /​sbin/​scsi_id +Restart the cluster nodes in order to get going. 
-  ​/​sbin/​scsi_id -g -u -s /block/sda+You can check the status of the cluster: 
 +  ​cman_tool status
  
-/​etc/​multipath.conf +The output ​will be something like:
- +
-  defaults { +
-     ​user_friendly_names yes +
- +
- +
-  } +
-  defaults { +
-         ​udev_dir ​       /dev +
-         ​polling_interval 5 +
-         ​default_selector ​       "​round-robin 0" +
-         ​default_getuid_callout ​ "/​sbin/​scsi_id -g -u -s /​block/​%n"​ +
-         ​failback ​       immediate +
-  } +
-  blacklist { +
-         ​wwid ​   200d04b651805e38e +
-         ​devnode "​^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"​ +
-         ​devnode "​^hd[a-z][[0-9]*]"​ +
-         ​devnode "​^cciss!c[0-9]d[0-9]*[p[0-9]*]"​ +
-  } +
-  multipaths { +
-         ​multipath { +
-                 ​wwid ​                   1494554000000000031343934353534303366313630633030 +
-                 ​alias ​                  ​xensan +
-                 ​path_grouping_policy ​   failover +
-                 ​path_checker ​           readsector0 +
-  } +
- +
- +
-Now after a reload of the multipath-tools and logging on the the iSCSI-targets,​ your multipath blockdevice ​will be ready for usage. +
-  /​etc/​init.d/​multipath-tools reload +
-  /​usr/​bin/​iscsiadm -m node -T iqn.2009-01.nl.pref:​storage.xensan -p 192.168.1.250:​3260 -l +
-  /​usr/​bin/​iscsiadm -m node -T iqn.2009-01.nl.pref:​storage.xensan2 -p 192.168.1.252:​3260 -l +
- +
-Lets check our new device: +
-  multipath -ll +
- +
-The output is something like+
 <​code>​ <​code>​
-xensan (1494554000000000031343934353534303366313630633030) dm-0 IET     ,​VIRTUAL-DISK ​  +Version: 6.1.0 
-[size=9.8G][features=0][hwhandler=0] +Config Version: ​
-\_ round-robin 0 [prio=1][active] +Cluster Namedomnull 
- \_ 1:0:0:0 sda 8:0   ​[active][ready] +Cluster Id13368 
-\_ round-robin 0 [prio=1][enabled] +Cluster MemberYes 
- \_ 0:0:0:0 sdb 8:16  [active][ready]+Cluster Generation12 
 +Membership state: Cluster-Member 
 +Nodes: 2 
 +Expected votes: ​
 +Total votes
 +Node votes
 +Quorum: 1   
 +Active subsystems: 8 
 +Flags2node Dirty  
 +Ports Bound: 0 11   
 +Node name: node2.example.com 
 +Node ID: 2 
 +Multicast addresses: 239.192.52.108  
 +Node addresses: 192.168.1.242
 </​code>​ </​code>​
  
-In addition to using multipath one could also consider to setup host-based mirror for the shared ​storage. This could be accomplished by setting up two or more iSCSI-servers (targets) and join them in a software mirror (RAID-1MD-device. This is left as an exercise for the reader as we choose ​to replicate our data using DRBD instead (SAN based mirroring).+===Simulate ​cluster with two domU's within a dom0=== 
 +When two Xen guests are running on the same dom0, which provides a shared ​disk to both of them (in order to simulate ​shared-storage cluster environment), add a bang after the '​w'​ in the Xen guest'​s configuration files to force Xen to allow you to mount the share in read-write mode (as Xen locks multiple access to the same virtual block device ​by default). This is not necessary if the connection ​to the shared storage is accesseble over iSCSI. Then you can simply initiate the iSCSI-sessions from within the Xen guests
  
-====Distributed replicated block device (drbd8)====+The Xen guest configuration file should contain something like this: 
 +  ... 
 +  disk=['​phy:​vgxen01/​lv_linva06_hda,​hda,​w',​*'​phy:​vgxen01/​lv_linva04_06_hdb,​hdb,​w!'​*] 
 +  instead of: 
 +  disk=['​phy:​vgxen01/​lv_linva06_hda,​hda,​w',​*'​phy:​vgxen01/​lv_linva04_06_hdb,​hdb,​w'​*] 
 +  ...
  
-===Install===+In this example you would have to use a cluster-aware filesystem like GFS/GFS2 or OCFS2 in order to prevent data corruption by the Xen guests.
  
-Binary package are available ​for Debian Lenny. +===GFS/​GFS2=== 
-  apt-get install drbd8-modules-2.6-xen-amd64 drbd8-utils+GFS, the Global File System is a shared disk file system that was released under the GPL by Redhat (TM) in 2004 (just a few months after they'd bought Sistina). Currently GFS2 is available ​as a technology previewFor production use it's probably wiser to opt for GFS.
  
-===Config===+==Install== 
 +If you installed and configured CLVM as mentioned just above, you won't need  any extra software on Debian Lenny.
  
-We want DRBD active-active replication ​(both our SAN nodes must be in primary mode in order to allow HA and Xen live migration). Edit /etc/drbd.conf+==Config== 
 +You can just format the device ​(logical volume ​in our case) and mount it afterwards just as you would do with a more regular filesystem. You only have to add some options. 
 +  * -p (the protocol which is probably '​lock_dlm'​) 
 +  * -t clustername:​filesystem_name (yes, you have to come up with a name for your fs) 
 +  * -j n ('​n'​ is the number of journals to createOne journal is required for each node that mounts the file system.). 
 +The commands for GFS and GFS2 do slightly differ:
  
 +GFS
 +  gfs_mkfs -p lock_dlm -t domnull:​xenvol2 /​dev/​xenvg/​xenvol2 -j 2
 +Output
 <​code>​ <​code>​
-global { +This will destroy any data on /dev/xenvg/xenvol2.
-  usage-count yes; +
-+
-common { +
-  protocol C; +
-+
-resource xensan { +
-  net {  +
-    allow-two-primaries;​  +
-    after-sb-0pri discard-zero-changes;​ +
-    after-sb-1pri discard-secondary;​ +
-    after-sb-2pri disconnect;​ +
-  } +
-  startup { +
-    become-primary-on both; +
-    wfc-timeout 0; +
-    degr-wfc-timeout 120; +
-  } +
-  disk { +
-    on-io-error detach; +
-    no-disk-flushes;​ +
-    no-md-flushes;​ +
-  } +
-  syncer { +
-    rate 300M; +
-  } +
-  on xensan1 { +
-    device ​   ​/dev/drbd0; +
-    disk      ​/dev/​vg00/​san1;​ +
-    address ​  192.168.1.250:​7789;​ +
-    meta-disk internal; +
-  } +
-  on xensan2 { +
-    device ​   /​dev/​drbd0;​ +
-    disk      /​dev/​vg00/​san2;​ +
-    address ​  ​192.168.1.252:​7789;​ +
-    meta-disk internal; +
-  } +
-+
-</​code>​+
  
-Now first the metadata needs to be written to disk (on both nodes). +Are you sure you want to proceed? [y/n] y
-  drbdadm create-md xensan+
  
-Choose one node as the initial master. This one will replicate its data to the other node. Data on the other node will be LOST!!! +Device: ​                   ​/dev/xenvg/xenvol2 
-  ​/etc/init.d/drbd start +Blocksize: ​                ​4096 
-  ​drbdadm -- --overwrite-data-of-peer primary xensan+Filesystem Size:           ​720804 
 +Journals: ​                 2 
 +Resource Groups: ​          12 
 +Locking Protocol: ​         lock_dlm 
 +Lock Table: ​               domnull:​xenvol2
  
-You can monitor the progress of the replication in /​proc/​drbd +Syncing... 
-As soon as this process has finished, repeat this last step on the other nodeAfterwards /proc/drbd should look like this:+All Done 
 +</code>
  
 +GFS2
 +   ​mkfs.gfs2 -p lock_dlm -t domnull:​xenvol2 /​dev/​xenvg/​xenvol2 -j 2
 +Output
 <​code>​ <​code>​
-version: 8.0.14 (api:86/proto:86) +This will destroy any data on /dev/xenvg/xenvol2.
-GIT-hash: bb447522fc9a87d0069b7e14f0234911ebdab0f7 build by phil@fat-tyre,​ 2008-11-12 16:40:33 +
- 0: cs:​Connected st:Primary/Primary ds:UpToDate/UpToDate C r--- +
-    ns:435488 nr:522292 dw:1229708 dr:165620 al:14 bm:85 lo:0 pe:0 ua:0 ap:0 +
- resync: used:0/61 hits:8209 misses:7 starving:0 dirty:0 changed:7 +
- act_log: used:0/127 hits:176840 misses:27 starving:0 dirty:13 changed:​14 +
-</​code>​+
  
-====Clusterfilesystems====+Are you sure you want to proceed? [y/n] y
  
-===Simulate a cluster with two domU's within a dom0=== +Device: ​                   /​dev/​xenvg/​xenvol2 
-With two xen guests running on the same dom0, which provides a shared disk to both of them (in order to simulate a shared-storage cluster environment), add a bang after the w in the xenguest'​s configuration file to force Xen to allow you to mount it in read-write mode anyway. This is not necessary if the connection to the shared storage is accesseble over iSCSI. Then you can simply initiate the iSCSI-sessions from within the Xen guests.  +Blocksize: ​                ​4096 
- +Device Size                3.00 GB (786432 blocks) 
-The config-line would then be like this+Filesystem Size:           3.00 GB (786431 blocks) 
-  ​disk=['​phy:vgxen01/​lv_linva06_hda,​hda,​w',​*'​phy:​vgxen01/​lv_linva04_06_hdb,​hdb,​w!'​*] +Journals                 2 
-  ​instead of+Resource Groups          12 
-  ​disk=['​phy:vgxen01/​lv_linva06_hda,​hda,​w',​*'​phy:vgxen01/​lv_linva04_06_hdb,​hdb,​w'​*] +Locking Protocol         "​lock_dlm"​ 
- +Lock Table               "​domnull:xenvol2"​ 
-Remember to use a cluster-aware filesystem like OCFS2 or GFS so the two Xen guests won't mess up the data.+</​code>​
  
 ===OCFS2=== ===OCFS2===
Line 382: Line 263:
 Start a domU on node1 and migrate it to node-2 like so: Start a domU on node1 and migrate it to node-2 like so:
   xm migrate --live name_xen_guest node2   xm migrate --live name_xen_guest node2
 +
 +
 +====DomU in HA-cluster control====
 +Put identical xen-domU config files in the /etc/xen/ directory.
 +
 +Create a /​etc/​cluster/​cluster.conf with:
 +
 +<​code>​
 +<?xml version="​1.0"?>​
 +<cluster name="​domup"​ config_version="​1">​
 +<cman two_node="​1"​ expected_votes="​1">​
 +</​cman>​
 +<​clusternodes>​
 +<​clusternode name="​saito.example.com"​ nodeid="​1">​
 +        <​fence>​
 +                <method name="​single">​
 +                        <device name="​manual"​ ipaddr="​saito.example.com"/>​
 +                </​method>​
 +        </​fence>​
 +</​clusternode>​
 +<​clusternode name="​obeliks.example.com"​ nodeid="​2">​
 +        <​fence>​
 +                <method name="​single">​
 +                        <device name="​manual"​ ipaddr="​obeliks.example.com"/>​
 +                </​method>​
 +        </​fence>​
 +</​clusternode>​
 +</​clusternodes>​
 +<rm>
 +  <​failoverdomains>​
 +    <​failoverdomain name="​FD"​ ordered="​1"​ restricted="​1">​
 +      <​failoverdomainnode name="​saito.example.com"/>​
 +      <​failoverdomainnode name="​obeliks.example.com"/>​
 +    </​failoverdomain>​
 +  </​failoverdomains>​
 +
 +  <vm name="​xendomU1"​ domain="​FD"​ autostart="​0"/>​
 +
 +</rm>
 +<​fencedevices>​
 +        <​fencedevice name="​manual"​ agent="​fence_manual"/>​
 +</​fencedevices>​
 +</​cluster>​
 +</​code>​
  
  
xen/live-migration_infrastructure.1231356987.txt.gz · Last modified: 2009/01/07 20:36 by Olivier Brugman