This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
storage:iscsi-san [2011/02/07 17:27] Luc Nieland |
storage:iscsi-san [2011/02/07 17:33] (current) Luc Nieland |
||
---|---|---|---|
Line 293: | Line 293: | ||
Important note if you want to put iSCSI on top of a DRBD device: on Debian Lenny the package init-scripts start iSCSI before DRBD and vice versa stop DRBD before iSCSI. You're advised to change these values around. | Important note if you want to put iSCSI on top of a DRBD device: on Debian Lenny the package init-scripts start iSCSI before DRBD and vice versa stop DRBD before iSCSI. You're advised to change these values around. | ||
- | ====Xen clustering, clvm and filesystems==== | ||
- | As soon as two or more Xen hosts (dom0 nodes) access the same (shared) storage you'll have to be very careful with writing data. Although a cluster filesystem is not necessary for Xen live migration to work, you would indeed need a cluster filesystem if you want to write any data to the share directly from a Xen host! Furthermore you probably want to use a volume manager to manage the available storage. As LVM2 itself is not cluster-aware you would need another solution. | ||
- | ===Cluster LVM (CLVM)=== | ||
- | We use CLVM, the Cluster Logical Volume Manager that forms part of Redhat Cluster Suite (TM). This software suite, generously made available by Redhat (TM) is also available on Debian Lenny. | ||
- | ==Install== | ||
- | Installation of the suite requires the userland tools and kernel modules. | ||
- | apt-get install redhat-cluster-suite redhat-cluster-modules-2.6.26-1-xen-amd64 | ||
- | ==Config== | ||
- | First instruct LVM to make use of the cluster locking functions, therefore change the locking type in /etc/lvm/lvm.conf to '3'. | ||
- | ... | ||
- | locking_type = 3 | ||
- | ... | ||
- | |||
- | Then set the default properties for the cluster manager (cman) in /etc/default/cman. The nodename is FQDN, make sure that your DNS configuration and /etc/hosts are uptodate. | ||
- | <code> | ||
- | CLUSTERNAME="domnull" | ||
- | NODENAME="node1.example.com" | ||
- | USE_CCS="yes" | ||
- | CLUSTER_JOIN_TIMEOUT=300 | ||
- | CLUSTER_JOIN_OPTIONS="" | ||
- | CLUSTER_SHUTDOWN_TIMEOUT=60 | ||
- | </code> | ||
- | |||
- | Create the cluster configuration file /etc/cluster/cluster.conf and copy it to the other cluster node. | ||
- | |||
- | mkdir /etc/cluster | ||
- | |||
- | /etc/cluster/cluster.conf | ||
- | <code> | ||
- | <?xml version="1.0"?> | ||
- | <cluster name="domnull" config_version="1"> | ||
- | |||
- | <cman two_node="1" expected_votes="1"> | ||
- | </cman> | ||
- | |||
- | <clusternodes> | ||
- | <clusternode name="node1.example.com" nodeid="1"> | ||
- | <fence> | ||
- | <method name="single"> | ||
- | <device name="manual" ipaddr="node1.example.com"/> | ||
- | </method> | ||
- | </fence> | ||
- | </clusternode> | ||
- | |||
- | <clusternode name="node2.example.com" nodeid="2"> | ||
- | <fence> | ||
- | <method name="single"> | ||
- | <device name="manual" ipaddr="node2.example.com"/> | ||
- | </method> | ||
- | </fence> | ||
- | </clusternode> | ||
- | </clusternodes> | ||
- | |||
- | <fencedevices> | ||
- | <fencedevice name="manual" agent="fence_manual"/> | ||
- | </fencedevices> | ||
- | |||
- | </cluster> | ||
- | </code> | ||
- | |||
- | You might want to change the default order of the init-scripts stopping a cluster node. In general you want the gfs(2)-tools to stop first. Then the clvmd service can stop and afterwards the cman-service. Make sure that the multipath and iscsi initiator-services are running at least till the cluster-services are properly shutdown. | ||
- | |||
- | Restart the cluster nodes in order to get going. | ||
- | You can check the status of the cluster: | ||
- | cman_tool status | ||
- | |||
- | The output will be something like: | ||
- | <code> | ||
- | Version: 6.1.0 | ||
- | Config Version: 1 | ||
- | Cluster Name: domnull | ||
- | Cluster Id: 13368 | ||
- | Cluster Member: Yes | ||
- | Cluster Generation: 12 | ||
- | Membership state: Cluster-Member | ||
- | Nodes: 2 | ||
- | Expected votes: 1 | ||
- | Total votes: 2 | ||
- | Node votes: 1 | ||
- | Quorum: 1 | ||
- | Active subsystems: 8 | ||
- | Flags: 2node Dirty | ||
- | Ports Bound: 0 11 | ||
- | Node name: node2.example.com | ||
- | Node ID: 2 | ||
- | Multicast addresses: 239.192.52.108 | ||
- | Node addresses: 192.168.1.242 | ||
- | </code> | ||
- | |||
- | ===Simulate a cluster with two domU's within a dom0=== | ||
- | When two Xen guests are running on the same dom0, which provides a shared disk to both of them (in order to simulate a shared-storage cluster environment), add a bang after the 'w' in the Xen guest's configuration files to force Xen to allow you to mount the share in read-write mode (as Xen locks multiple access to the same virtual block device by default). This is not necessary if the connection to the shared storage is accesseble over iSCSI. Then you can simply initiate the iSCSI-sessions from within the Xen guests. | ||
- | |||
- | The Xen guest configuration file should contain something like this: | ||
- | ... | ||
- | disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w!'*] | ||
- | instead of: | ||
- | disk=['phy:vgxen01/lv_linva06_hda,hda,w',*'phy:vgxen01/lv_linva04_06_hdb,hdb,w'*] | ||
- | ... | ||
- | |||
- | In this example you would have to use a cluster-aware filesystem like GFS/GFS2 or OCFS2 in order to prevent data corruption by the Xen guests. | ||
- | |||
- | ===GFS/GFS2=== | ||
- | GFS, the Global File System is a shared disk file system that was released under the GPL by Redhat (TM) in 2004 (just a few months after they'd bought Sistina). Currently GFS2 is available as a technology preview. For production use it's probably wiser to opt for GFS. | ||
- | |||
- | ==Install== | ||
- | If you installed and configured CLVM as mentioned just above, you won't need any extra software on Debian Lenny. | ||
- | |||
- | ==Config== | ||
- | You can just format the device (logical volume in our case) and mount it afterwards just as you would do with a more regular filesystem. You only have to add some options. | ||
- | * -p (the protocol which is probably 'lock_dlm') | ||
- | * -t clustername:filesystem_name (yes, you have to come up with a name for your fs) | ||
- | * -j n ('n' is the number of journals to create. One journal is required for each node that mounts the file system.). | ||
- | The commands for GFS and GFS2 do slightly differ: | ||
- | |||
- | GFS | ||
- | gfs_mkfs -p lock_dlm -t domnull:xenvol2 /dev/xenvg/xenvol2 -j 2 | ||
- | Output | ||
- | <code> | ||
- | This will destroy any data on /dev/xenvg/xenvol2. | ||
- | |||
- | Are you sure you want to proceed? [y/n] y | ||
- | |||
- | Device: /dev/xenvg/xenvol2 | ||
- | Blocksize: 4096 | ||
- | Filesystem Size: 720804 | ||
- | Journals: 2 | ||
- | Resource Groups: 12 | ||
- | Locking Protocol: lock_dlm | ||
- | Lock Table: domnull:xenvol2 | ||
- | |||
- | Syncing... | ||
- | All Done | ||
- | </code> | ||
- | |||
- | GFS2 | ||
- | mkfs.gfs2 -p lock_dlm -t domnull:xenvol2 /dev/xenvg/xenvol2 -j 2 | ||
- | Output | ||
- | <code> | ||
- | This will destroy any data on /dev/xenvg/xenvol2. | ||
- | |||
- | Are you sure you want to proceed? [y/n] y | ||
- | |||
- | Device: /dev/xenvg/xenvol2 | ||
- | Blocksize: 4096 | ||
- | Device Size 3.00 GB (786432 blocks) | ||
- | Filesystem Size: 3.00 GB (786431 blocks) | ||
- | Journals: 2 | ||
- | Resource Groups: 12 | ||
- | Locking Protocol: "lock_dlm" | ||
- | Lock Table: "domnull:xenvol2" | ||
- | </code> | ||
- | |||
- | ===OCFS2=== | ||
- | Today's Linux kernel has support for the rewrited Oracle Cluster Filesystem (OCFS2). It allows several clients to read-write to the same (thus shared) storage. You could use this filesystem to share the Xen guest configuration files among several Xen hosts. | ||
- | |||
- | ==Install== | ||
- | Debian Lenny has kernel support for OCFS2 by default. You only need the ocfs2-tools (and optionally 'ocfs2console' if you want a graphical interface). | ||
- | apt-get install ocfs2-tools | ||
- | |||
- | ==Config== | ||
- | dpkg can help with some basic configuration: | ||
- | dpkg-reconfigure ocfs2-tools | ||
- | |||
- | Manually check/edit /etc/default/oc2b | ||
- | <code> | ||
- | # O2CB_ENABLED: 'true' means to load the driver on boot. | ||
- | O2CB_ENABLED=true | ||
- | |||
- | # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. | ||
- | O2CB_BOOTCLUSTER=domnull | ||
- | |||
- | # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. | ||
- | O2CB_HEARTBEAT_THRESHOLD=31 | ||
- | |||
- | # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered dead. | ||
- | O2CB_IDLE_TIMEOUT_MS=30000 | ||
- | |||
- | # O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet is sent. | ||
- | O2CB_KEEPALIVE_DELAY_MS=2000 | ||
- | |||
- | # O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection attempts. | ||
- | O2CB_RECONNECT_DELAY_MS=2000 | ||
- | </code> | ||
- | |||
- | Now create /etc/ocfs2/cluster.conf for your cluster | ||
- | <code> | ||
- | node: | ||
- | ip_port = 7777 | ||
- | ip_address = 192.168.1.248 | ||
- | number = 0 | ||
- | name = xenhost1 | ||
- | cluster = domnull | ||
- | |||
- | node: | ||
- | ip_port = 7777 | ||
- | ip_address = 192.168.1.249 | ||
- | number = 1 | ||
- | name = xenhost2 | ||
- | cluster = domnull | ||
- | |||
- | cluster: | ||
- | node_count = 2 | ||
- | name = domnull | ||
- | </code> | ||
- | |||
- | Restart the cluster nodes in order to get started. | ||
- | You can check the status of your cluster like so: | ||
- | /etc/init.d/o2cb status | ||
- | |||
- | The output should be like: | ||
- | <code> | ||
- | Driver for "configfs": Loaded | ||
- | Filesystem "configfs": Mounted | ||
- | Stack glue driver: Loaded | ||
- | Stack plugin "o2cb": Loaded | ||
- | Driver for "ocfs2_dlmfs": Loaded | ||
- | Filesystem "ocfs2_dlmfs": Mounted | ||
- | Checking O2CB cluster domnull: Online | ||
- | Heartbeat dead threshold = 31 | ||
- | Network idle timeout: 30000 | ||
- | Network keepalive delay: 2000 | ||
- | Network reconnect delay: 2000 | ||
- | Checking O2CB heartbeat: Active | ||
- | </code> | ||
- | |||
- | At this moment you can simply format and use your shared storage: | ||
- | mkfs.ocfs2 /dev/xenvg/xenvol1 | ||
- | mount -t ocfs2 /dev/xenvg/xenvol1 /mnt/ | ||
- | |||
- | =====Config===== | ||
- | Configure the xen relocation service in /etc/xen/xend-config.sxp | ||
- | ... | ||
- | (xend-relocation-address '') | ||
- | (xend-relocation-server yes) | ||
- | (xend-relocation-port 8002) | ||
- | (xend-relocation-address '') | ||
- | (xend-relocation-hosts-allow '') | ||
- | ... | ||
- | |||
- | Restart xend on both nodes and make sure that port 8002 accepts connections from everywhere. Check for a LISTENER line with netstat. | ||
- | |||
- | Important: Time has to be synced between both nodes (See [[hannibal:ntp]]). | ||
- | |||
- | =====Action===== | ||
- | |||
- | Start a domU on node1 and migrate it to node-2 like so: | ||
- | xm migrate --live name_xen_guest node2 | ||
- | |||
- | |||
- | ====DomU in HA-cluster control==== | ||
- | Put identical xen-domU config files in the /etc/xen/ directory. | ||
- | |||
- | Create a /etc/cluster/cluster.conf with: | ||
- | |||
- | <code> | ||
- | <?xml version="1.0"?> | ||
- | <cluster name="domup" config_version="1"> | ||
- | <cman two_node="1" expected_votes="1"> | ||
- | </cman> | ||
- | <clusternodes> | ||
- | <clusternode name="saito.example.com" nodeid="1"> | ||
- | <fence> | ||
- | <method name="single"> | ||
- | <device name="manual" ipaddr="saito.example.com"/> | ||
- | </method> | ||
- | </fence> | ||
- | </clusternode> | ||
- | <clusternode name="obeliks.example.com" nodeid="2"> | ||
- | <fence> | ||
- | <method name="single"> | ||
- | <device name="manual" ipaddr="obeliks.example.com"/> | ||
- | </method> | ||
- | </fence> | ||
- | </clusternode> | ||
- | </clusternodes> | ||
- | <rm> | ||
- | <failoverdomains> | ||
- | <failoverdomain name="FD" ordered="1" restricted="1"> | ||
- | <failoverdomainnode name="saito.example.com"/> | ||
- | <failoverdomainnode name="obeliks.example.com"/> | ||
- | </failoverdomain> | ||
- | </failoverdomains> | ||
- | |||
- | <vm name="xendomU1" domain="FD" autostart="0"/> | ||
- | |||
- | </rm> | ||
- | <fencedevices> | ||
- | <fencedevice name="manual" agent="fence_manual"/> | ||
- | </fencedevices> | ||
- | </cluster> | ||
- | </code> | ||
=====Literature===== | =====Literature===== | ||
- | * a proof of concept regarding Xen migration on Suse Linux by Novell Presales, available at [[http://forge.novell.com/modules/xfcontent/private.php?reference_id=2736&content=/library/Xen%20live%20migration%20demo/XEN_migration_demo_1.1.pdf|http://forge.novell.com/.../XEN_migration_demo_1.1.pdf]] | ||
* http://www.linux1394.org | * http://www.linux1394.org |