On Tue, Nov 4, 2014 at 6:34 PM, Arman Khalatyan <arm2arm(a)gmail.com> wrote:
I will he I.teresting to see your iscsi setup with drbd. Did you got
splitbrain before failure?
Did you check if your target went to readonly mode?
Thanks
Arman.
I used some information provided here, even if it is with CentOS 5.7 and
lvm on top of drbd, while in my setup I have CentOS 6.5 and drbd on top of
lvm:
http://blogs.mindspew-age.com/2012/04/05/adventures-in-high-availability-...
- my drbd resource definition for iSCSI HA:
[root@srvmgmt01 ~]# cat iscsiha.res
resource iscsiha {
disk {
disk-flushes no;
md-flushes no;
fencing resource-and-stonith;
}
device minor 2;
disk /dev/iscsihavg/iscsihalv;
syncer {
rate 30M;
verify-alg md5;
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
on srvmgmt01.localdomain.local {
address 192.168.230.51:7790;
meta-disk internal;
}
on srvmgmt02.localdomain.local {
address 192.168.230.52:7790;
meta-disk internal;
}
}
- tgtd is setup to start on both nodes at startup
iscsi and iscsid services configured to off
- Put the agents iSCSILogicalUnit e iSCSITarget under
/usr/lib/ocf/resource.d/heartbeat/ on both nodes
downloaded from here, as they are not provided in plain CentOS:
http://linux-ha.org/doc/man-pages/re-ra-iSCSITarget.html
- Here below the pcs steps to create the group:
pcs cluster cib iscsiha_cfg
pcs -f iscsiha_cfg resource create p_drbd_iscsiha ocf:linbit:drbd
drbd_resource=iscsiha \
op monitor interval="29s" role="Master" timeout="30" op
monitor
interval="31s" \
role="Slave" timeout="30" op start interval="0"
timeout="240" op stop
interval="0" timeout="100"
pcs -f iscsiha_cfg resource master ms_drbd_iscsiha p_drbd_iscsiha \
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
pcs -f iscsiha_cfg resource create p_iscsi_store1 ocf:heartbeat:iSCSITarget
\
params implementation="tgt"
iqn="iqn.2014-07.local.localdomain:store1"
tid="1" \
allowed_initiators="10.10.1.61 10.10.1.62 10.10.1.63"
incoming_username="iscsiuser" incoming_password="iscsipwd" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60" \
op monitor interval="30" timeout="60"
pcs -f iscsiha_cfg resource create p_iscsi_store1_lun1
ocf:heartbeat:iSCSILogicalUnit \
params implementation="tgt"
target_iqn="iqn.2014-07.local.localdomain:store1" lun="1" \
path="/dev/drbd/by-res/iscsiha" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60" \
op monitor interval="30" timeout="60"
pcs -f iscsiha_cfg resource create p_ip_iscsi ocf:heartbeat:IPaddr2 \
params ip="10.10.1.71" \
op start interval="0" timeout="20" \
op stop interval="0" timeout="20" \
op monitor interval="30" timeout="20"
pcs -f iscsiha_cfg resource create p_portblock-store1-block
ocf:heartbeat:portblock \
params ip="10.10.1.71" portno="3260" protocol="tcp"
action="block"
pcs -f iscsiha_cfg resource create p_portblock-store1-unblock
ocf:heartbeat:portblock \
params ip="10.10.1.71" portno="3260" protocol="tcp"
action="unblock" \
op monitor interval="30s"
pcs -f iscsiha_cfg resource group add g_iscsiha p_portblock-store1-block
p_ip_iscsi p_iscsi_store1 \
p_iscsi_store1_lun1 p_portblock-store1-unblock
pcs -f iscsiha_cfg constraint colocation add Started g_iscsiha with Master
ms_drbd_iscsiha INFINITY
pcs -f iscsiha_cfg constraint order promote ms_drbd_iscsiha then start
g_iscsiha
pcs cluster cib-push iscsiha_cfg
- output of "crm_mon -1"
Resource Group: g_iscsiha
p_portblock-store1-block (ocf::heartbeat:portblock): Started
srvmgmt01.localdomain.local
p_ip_iscsi (ocf::heartbeat:IPaddr2): Started
srvmgmt01.localdomain.local
p_iscsi_store1 (ocf::heartbeat:iSCSITarget): Started
srvmgmt01.localdomain.local
p_iscsi_store1_lun1 (ocf::heartbeat:iSCSILogicalUnit): Started
srvmgmt01.localdomain.local
p_portblock-store1-unblock (ocf::heartbeat:portblock): Started
srvmgmt01.localdomain.local
- output of tgtadm on both nodes while srvmgmt01 is active for the group
[root@srvmgmt01 ~]# tgtadm --mode target --op show
Target 1: iqn.2014-07.local.localdomain:store1
System information:
Driver: iscsi
State: ready
I_T nexus information:
LUN information:
LUN: 0
Type: controller
SCSI ID: IET 00010000
SCSI SN: beaf10
Size: 0 MB, Block size: 1
Online: Yes
Removable media: No
Prevent removal: No
Readonly: No
Backing store type: null
Backing store path: None
Backing store flags:
LUN: 1
Type: disk
SCSI ID: p_iscsi_store1_l
SCSI SN: 66666a41
Size: 214738 MB, Block size: 512
Online: Yes
Removable media: No
Prevent removal: No
Readonly: No
Backing store type: rdwr
Backing store path: /dev/drbd/by-res/iscsiha
Backing store flags:
Account information:
iscsiuser
ACL information:
10.10.1.61
10.10.1.62
10.10.1.63
on the passive node:
[root@srvmgmt02 heartbeat]# tgtadm --mode target --op show
[root@srvmgmt02 heartbeat]#
TBV performance and tuning values taken from here:
http://www.dbarticles.com/centos-6-iscsi-tgtd-setup-and-performance-adjus...
my cluster is basic for testing so not critical for my environment...
at the momento only 1Gbit/s network and one adapter for drbd replica and
one for iSCSI traffic
Tested with some I/O basic benchmarks on VM insisting on the SD and I got
about 90-95MB/s on both drbd and iSCSI networks. Also relocation of iSCSI
service while benchmark active seemed not to cause problems with SD and VM.
- I also enabled iptables on cluster nodes so that the initiators (oVirt
hosts) could connect to the ip alias dedicated to iSCSI servicing:
in /etc/sysconfig/iptables:
# iSCSI
-A INPUT -p tcp -m tcp -d 10.10.1.71 --dport 3260 -j ACCEPT
I have to recheck the logs to give exact scenario of what happened causing
the problem.... not being a critical system is not so well monitored at the
moment...
comments welcome
Gianluca